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Preface 



KI 2004 was the 27th edition of the annual German Conference on Artificial Intel- 
ligence, which traditionally brings together academic and industrial researchers 
from all areas of AI and which enjoys increasing international attendance. 

KI 2004 received 103 submissions from 26 countries. This volume contains 
the 30 papers that were finally selected for presentation at the conference. The 
papers cover quite a broad spectrum of “classical” subareas of AI, like natu- 
ral language processing, neural networks, knowledge representation, reasoning, 
planning, and search. When looking at this year’s contributions, it was exciting 
to observe that there was a strong trend towards actual real-world applications 
of AI technology. A majority of contributions resulted from or were motivated 
by applications in a variety of areas. Examples include applications of plan- 
ning, where the technology is being exploited for taxiway traffic control and 
game playing; natural language processing and knowledge representation are 
enabling advanced Web-based information processing; and the integration of re- 
sults from automated reasoning, neural networks and machine perception into 
robotics leads to significantly improved capabilities of autonomous systems. 

The technical programme of KI 2004 was highlighted by invited talks from 
outstanding researchers in the areas of automated reasoning, robot planning, 
constraint reasoning, machine learning, and semantic Web: Jorg Siekmann (DFKI 
and University of Saarland, Saarbriicken) , Malik Glrallab (LAAS-CNRS, Toulouse), 
Frangois Fages (INRIA Rocquencourt), Martin Riedmiller (University of Os- 
nabriick), and Wolfgang Wahlster (DFKI and University of Saarland, Saarbriicken). 
Their invited papers are also presented in this volume. 

This year KI was held in co-location with INFORMATIK 2004, the German 
Conference on Computer Science, organized under the auspices of the German 
Informatics Society (GI). KI and INFORMATIK shared a joint day of invited 
presentations. The talks by Wolfgang Wahlster and Malik Glrallab were part of 
this joint programme. 

A conference like KI 2004 involves the dedication of many people. First of 
all, there are the authors, who submitted their papers to the conference; there 
are the members of the program committee and the many additional reviewers, 
who worked hard on providing high-quality reviews in time and participated in 
the paper discussion process; and there is the highly supportive KI 2004 local 
arrangements committee. We are most grateful to all of them. 
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Automated Reasoning Tools 
for Molecular Biology 



Frangois Fages 

Projet Contraintes, INRIA Rocquencourt 
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http : //contraintes . inria.fr 



In recent years, molecular biology has engaged in a large-scale effort to elucidate 
high-level cellular processes in terms of their biochemical basis at the molecu- 
lar level. The mass production of post genomic data, such as ARN expression, 
protein production and protein-protein interaction, raises the need of a strong 
parallel effort on the formal representation of biological processes. 

In this talk, we shall present the Biochemical Abstract Machine BIOCHAM 
and advocate its use as a formal modeling environment for networks biology. 
Biocham provides a precise semantics to biomolecular interaction maps. Based 
on this formal semantics, the Biocham system offers automated reasoning tools 
for querying the temporal properties of the system under all its possible behav- 
iors. We shall review the main features of Biocham and report on our modeling 
experience with this language. In particular we shall report on a model of the 
mammalian cell cycle’s control developped after Kohn’s map. 

Biocham has been designed in the framework of the ARC CPBIO on “Process 
Calculi and Biology of Molecular Networks” [1] which aims at pushing forward 
a declarative and compositional approach to modeling languages in Systems 
Biology. Biocham is a language and a programming environment for modeling 
biochemical systems, making simulations, and checking temporal properties. It 
is composed of: 

1. a rule-based language for modeling biochemical systems, allowing patterns 
and constraints in the definition of rules; 

2. a simple simulator; 

3. a powerful query language based on Computation Tree Logic CTL; 

4. an interface to the NuSMV [2] model checker for automatically evaluating 
CTL queries. 

The use of Computation Tree Logic (CTL) [3] for querying the temporal 
properties of the system provides an alternative technique to numerical models 
based on differential equations, in particular when numerical data are miss- 
ing. The model-checking tools associated to CTL automate reasoning on all the 
possible behaviors of the system modeled in a purely qualitative way. The se- 
mantics of Biocham ensures that the set of possible behaviors of the model over- 
approximates the set of all behaviors of the system corresponding to different 
kinetic parameters. 
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Biocham shares several similarities with the Pathway Logic system [4] imple- 
mented in Maude. Both systems rely on an algebraic syntax and are rule-based 
languages. One difference is the use in Biocham of CTL logic which allows us 
to express a wide variety of biological queries, and the use of a state-of-the-art 
symbolic model checker for handling the complexity of highly non-deterministic 
models. 

The first experimental results of this approach for querying models of bio- 
chemical networks in temporal logic have been reported in [5, 6], on a qualitative 
model of the mammalian cell cycle control [7, 8] and in [6] on a quantitative 
model of gene expression [9] . In this talk we describe the Biocham system which 
provides a modeling environment supporting this methodology. The full version 
of this paper will appear in [10]. 
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Dedicated to Martin Davis 



Abstract. The year 2004 marks the fiftieth birthday of the first com- 
puter generated proof of a mathematical theorem: “the sum of two even 
numbers is again an even number” (with Martin Davis’ implementation 
of Presburger Arithmetic in 1954). 

While Martin Davis and later the research community of automated 
deduction used machine oriented calculi to find the proof for a theorem 
by automatic means, the Automath project of N.G. de Bruijn 1 - more 
modest in its aims with respect to automation - showed in the late 1960s 
and early 70s that a complete mathematical textbook could be coded and 
proof-checked by a computer. 

Classical theorem proving procedures of today are based on ingenious 
search techniques to find a proof for a given theorem in very large search 
spaces - often in the range of several billion clauses. But in spite of 
many successful attempts to prove even open mathematical problems 
automatically, their use in everyday mathematical practice is still limited. 
The shift from search based methods to more abstract planning tech- 
niques however opened up a new paradigm for mathematical reasoning 
on a computer and several systems of the new kind now employ a mix 
of interactive, search based as well as proof planning techniques. 

The i?MEGA system is at the core of several related and well-integrated 
research projects of the 1?MEGA research group, whose aim is to de- 
velop system support for the working mathematician, in particular it 
supports proof development at a human oriented level of abstraction. 
It is a modular system with a central proof data structure and several 
supplementary subsystems including automated deduction and computer 
algebra systems. 1?MEGA has many characteristics in common with sys- 
tems like NuPrL [ACE+00], CoQ [Coq03], Hol [GM93], Pvs [ORR+96], 
and Isabelle [Pau94,NPW02]. However, it differs from these systems 
with respect to its focus on proof planning and in that respect it is 
more similar to the proof planning systems Clam and AClam at Edin- 
burgh [RSG98,BvHHS90] . 



1 http:/ /www. win. tue.nl/automath/ 
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1 Introduction 

The vision of computer-supported mathematics and a system which provides 
integrated support for all work phases of a mathematician (see Fig. 1) has fasci- 
nated researchers in artificial intelligence, particularly in the deduction systems 
area, and in mathematics for a long time. The dream of mechanizing (mathemat- 
ical) reasoning dates back to Gottfried Wilhelm Leibniz in the 17th century with 
the touching vision that two philosophers engaged in a dispute would one day 
simply code their arguments into an appropriate formalism and then calculate 
(Calculemus!) who is right. At the end of the 19th century modern mathemat- 
ical logic was born with Frege’s Begriffsschrift and an important milestone in 
the formalization of mathematics was Hilbert’s program and the 20th century 
Bourbakism. 




Fig. 1 . Calculemus illustration of different challenges for a mathematical assistance 
system. 



With the logical formalism for the representation and calculation of math- 
ematical arguments emerging in the first part of the twentieth century it was 
but a small step to implement these techniques now on a computer as soon as it 
was widely available. In 1954 Martin Davis’ Presburger Arithmetic Program was 
reported to the US Army Ordnance and the Dartmouth Conference in 1956 is 
not only known for giving birth to artificial intelligence in general but also more 
specifically for the demonstration of the first automated reasoning programs for 
mathematics by Herb Simon and Alan Newell. 
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However, after the early enthusiasm of the 1960s, in particular the publication 
of the resolution principle in 1965 [Rob65], and the developments in the 70s a 
more sober realization of the actual difficulties involved in automating everyday 
mathematics set in and the field increasingly fragmented into many subareas 
which all developed their specific techniques and systems 2 . It is only very recently 
that this trend is reversed, with the Calculemus 3 and Mkm 4 communities as 
driving forces of this movement. In Calculemus the viewpoint is bottom-up, 
starting from existing techniques and tools developed in the community. Mkm 
approaches the goal of computer-based mathematics in the new millennium by a 
complementary top-down approach starting from existing, mainly pen and paper 
based mathematical practice down to system support. 

We shall provide an overview and the main developments of the 1?MEGA 
project in the following and then point to current research and some future 
goals. 

2 IOMEGA 

The 1?mega project represents one of the major attempts to build an all en- 
compassing assistant tool for the working mathematician. It is a representative 
of systems in the new paradigm of proof planning and combines interactive and 
automated proof construction for domains with rich and well-structured math- 
ematical knowledge. The inference mechanism at the lowest level of abstraction 
is an interactive theorem prover based on a higher order natural deduction (ND) 
variant of a soft-sorted version of Church’s simply typed A-calculus [Chu40]. 
The logical language, which also supports partial functions, is called VOST, 
for partial functions and order sorted type theory. While this represents the 
“machine code” of the system the user will seldom want to see, the search for 
a proof is usually conducted at a higher level of abstraction defined by tactics 
and methods. Automated proof search at this abstract level is called proof plan- 
ning (see Section 2.3). Proof construction is also supported by already proven 
assertions and theorems and by calls to external systems to simplify or solve 
subproblems. 

2.1 System Overview 

At the core of IOMEGA is the proof plan data structure VVS [CS00] , in which 
proofs and proof plans are represented at various levels of granularity and ab- 
straction (see Fig. 2). The VVS is a directed acyclic graph, where open nodes 
represent unjustified propositions that still need to be proved and closed nodes 
represent propositions that are already proved. The proof plans are developed 

2 The history of the field is presented in a classical paper by Martin Davis [Dav83] and 
also in [DavOl] and more generally in his history of the making of the first computers 
[Dav65]. Another source is Jorg Siekmann [Sie92] and more recently [Sie04]. 

3 www.calculemus.org 

4 monet . nag . co . uk/mkm/ index. ht ml 
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and classified with respect to a taxonomy of mathematical theories in the mathe- 
matical knowledge base MBase [FK00a,KF01]. The user of J?MEGA, or the proof 
planner Multi [MMOO] , or else the suggestion mechanism 1?Ants [BSOO] modify 
the VVS during proof development until a complete proof plan has been found. 
They can also invoke external reasoning systems, whose results are included in 
the VVS after appropriate transformation. Once a complete proof plan at an 
appropriate level of abstraction has been found, this plan must be expanded 
by sub-methods and sub-tactics into lower levels of abstraction until finally a 
proof at the level of the logical calculus is established. After expansion of these 
high-level proofs to the underlying ND calculus, the VVS can be checked by 
Iomega’s proof checker. 




Fig. 2. The proof plan datastructure PDS is at the core of the I?MEGA system. Proof 
construction is facilitated by knowledge-based proof planning (deliberative), agent- 
oriented theorem proving (reactive), or by user interaction. 



Hence, there are two main tasks supported by this system, namely (i) to 
find a proof plan, and (ii) to expand this proof plan into a calculus-level proof; 
and both jobs can be equally difficult and time consuming. Task (ii) employs an 
LCF-style tactic expansion mechanism, proof search or a combination of both 
in order to generate a lower-level proof object. It is a design objective of the 
VVS that various proof levels coexist with their respective relationships being 
dynamically maintained. 

The graphical user interface CflUX [SHB+99] (see Fig. 4) provides both a 
graphical and a tabular view of the proof under consideration, and the interac- 
tive proof explanation system P.rex [Fie01b,Fie01a,Fie01c] generates a natural- 
language presentation of the proof. 

The previously monolithic system has been split up and separated into sev- 
eral independent modules, which are connected via the mathematical software 
bus MathWeb-SB [ZK02]. An important benefit is that MathWeb-SB mod- 
ules can be distributed over the Internet and are then remotely accessible by 
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other research groups as well. There is now a very active MathWeb user commu- 
nity with sometimes several thousand theorems and lemmata being proven per 
day. Most theorems are generated automatically as (currently non-reusable and 
non-indexed) subproblems in natural language processing (see the Doris system 
[DorOl]), proof planning and verification tasks. 




Fig. 3. The vision of an all encompassing mathematical assistance environment: we 
have now modularized and out-sourced many of the support tools such that they can 
also be used by other systems via the MathWeb-SB software bus. 

2.2 External Systems 

Proof problems require many different skills for their solution. Therefore, it is 
desirable to have access to several systems with complementary capabilities, to 
orchestrate their use, and to integrate their results. f?MEGA interfaces hetero- 
geneous external systems such as computer algebra systems (CASs), higher and 
first order automated theorem proving systems (ATPs), constraint solvers (CSs), 
and model generation systems (MGs). 

Their use is twofold: they may provide a solution to a subproblem, or they 
may give hints for the control of the search for a proof. In the former case, the 
output of an incorporated reasoning system is translated and inserted as a sub- 
proof into the VVS. This is beneficial for interfacing systems that operate at 
different levels of abstraction, and also for a human-oriented display and inspec- 
tion of a partial proof. Importantly, it also enables us to check the soundness of 
each contribution by expanding the inserted subproof to a logic-level proof and 
then verify it by J?MEGA’s proof checker. 
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Currently, the following external systems are integrated in Gmega: 

CASs provide symbolic computation, which can be used in two ways: first, 
to compute hints to guide the proof search (e.g., witnesses for existential 
variables), and, second, to perform some complex algebraic computation such 
as to normalize or simplify terms. In the latter case the symbolic computation 
is directly translated into proof steps in Gmega. CASs are integrated via the 
transformation and translation module Sapper [SorOO]. Currently, IOMEGA 
uses the systems Maple [CGG + 92] and GAP [S + 95]. 

ATPs are employed to solve subgoals. Currently Gmega uses the first order 
provers Bliksem [dN99], EQP [McC97], Otter [McC94], Protein [BF94], 
Spass [WAB+99], WaldMeister [HJL99], the higher order systems TPS 
[ABI+96], and CEO [BK98,Ben99], and we plan to incorporate Vampire 
[RV01]. The first order ATPs are connected via Tramp [MeiOO], which is 
a proof transformation system that transforms resolution-style proofs into 
assertion-level ND proofs to be integrated into Iomega’s VVS. Tps already 
provides ND proofs, which can be further processed and checked with little 
transformational effort [BBS99] . 

MGs provide either witnesses for free (existential) variables, or counter-models, 
which show that some subgoal is not a theorem. Hence, they help to guide the 
proof search. Currently, IOMEGA uses the model generators Satchmo [MB88] 
and Sem [ZZ95], 

CSs construct mathematical objects with theory-specific properties as witnesses 
for free (existential) variables. Moreover, a constraint solver can help to re- 
duce the proof search by checking for inconsistencies of constraints. Cur- 
rently, Gmega employs CoSTE [MZMOO], a constraint solver for inequalities 
and equations over the field of real numbers. 



2.3 Proof Planning 

Gmega’s main focus is on knowledge-based proof planning [Bun88,Bun91], 
[MS99], where proofs are not conceived in terms of low-level calculus rules, 
but at a much higher level of abstraction that highlights the main ideas and 
de-emphasizes minor logical or mathematical manipulations on formulae. 

Knowledge-based proof planning is a new paradigm in automated theorem 
proving, which swings the motivational pendulum back to its AI origins in that it 
employs and further develops many AI principles and techniques such as hierar- 
chical planning, knowledge representation in frames and control rules, constraint 
solving, tactical theorem proving, and meta-level reasoning. It differs from tradi- 
tional search-based techniques in automated theorem proving not least in its level 
of abstraction: the proof of a theorem is planned at an abstract level where an 
outline of the proof is found. This outline, that is, the abstract proof plan, can be 
recursively expanded to construct a proof within a logical calculus provided the 
proof plan does not fail. The plan operators represent mathematical techniques 
familiar to a working mathematician. While the knowledge of such a mathemat- 
ical domain as represented within methods and control rules is specific to the 
mathematical field, the representational techniques and reasoning procedures 
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are general-purpose. For example, one of our first case studies [MS99] used the 
limit theorems proposed by Woody Bledsoe [Ble90] as a challenge to automated 
reasoning systems. The general-purpose planner makes use of this mathematical 
domain knowledge and of the guidance provided by declaratively represented 
control rules, which correspond to mathematical intuition about how to prove 
a theorem in a particular situation. These rules provide a basis for meta-level 
reasoning and goal-directed behavior. 

Domain knowledge is encoded into methods, control rules, and strategies. 
Moreover, methods and control rules can employ external systems (e.g., a com- 
puter algebra system) and make use of the knowledge in these systems. Iomega’s 
multi-strategy proof planner Multi [MMOO] searches then for a plan using the 
acquired methods and strategies guided by the control knowledge in the control 
rules. 

2.3.1 AI Principles in Proof Planning. A planning problem is a formal 
description of an initial state , a goal , and some operators that can be used to 
transform the initial state via some intermediate states to a state that satisfies 
the goal. Applied to a planning problem, a planner returns a sequence of actions , 
that is, instantiated operators, which reach a goal state from the initial state 
when executed. Such a sequence of actions is also called a solution plan. 

Proof planning considers mathematical theorems as planning problems 
[Bun88] . The initial state of a proof planning problem consists of the proof as- 
sumptions of the theorem, whereas the goal is the theorem itself. The operators 
in proof planning are the methods. 

In i?MEGA, proof planning is the process that computes actions, that is, 
instantiations of methods, and assembles them in order to derive a theorem from 
a set of assumptions. The effects and the preconditions of an action in proof 
planning are proof nodes with formulae in the higher order language VOST, 
where the effects are considered as logically inferable from the preconditions. A 
proof plan under construction is represented in the proof plan data structure 
VVS (see Section 2.5). Initially, the VVS consists of an open node containing 
the statement to be proved, and closed, that is, justified, nodes for the proof 
assumptions. The introduction of an action changes the VVS by adding new 
proof nodes and justifying the effects of the action by applications of the method 
of the action to its premises. The aim of the proof planning process is to reach 
a closed VVS , that is, a VVS without open nodes. The solution proof plan 
produced is then a record of the sequence of actions that lead to a closed VVS. 

By allowing for forward and backward actions Iomega’s proof planning com- 
bines forward and backward state-space planning. Thus, a planning state is a 
pair of the current world state and the current goal state. The initial world state 
consists of the given proof assumptions and is transfered by forward actions into 
a new world state. The goal state consists of the initial open node and is trans- 
fered by backward actions into a new goal state containing new open nodes. 
From this point of view the aim of proof planning is to compute a sequence of 
actions that derives a current world state in which all the goals in the current 
goal state are satisfied. 
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As opposed to precondition achievement planning (e.g., see [Wel94]), effects 
of methods in proof planning do not cancel each other. For instance, an action 
with effect ->F introduced for the open node L\ does not threaten the effect 
F introduced by another action for the open node £ 2 - Dependencies among 
open nodes result from shared variables for witness terms and their constraints. 
Constraints can be, for instance, instantiations for the variables but they can 
also be mathematical constraints such as x < c, which states that, whatever 
the instantiation for x is, it has to be smaller than c. The constraints created 
during the proof planning process are collected in a constraint store. An action 
introducing new constraints is applicable only if its constraints are consistent 
with the constraints collected so far. Dependencies among goals with shared 
variables are difficult to analyze and can cause various kinds of failures in a 
proof planning attempt. First results about how to analyze and deal with such 
failures are discussed in [Mei03]. 



Methods, Control Rules, and Strategies. Methods are traditionally per- 
ceived as tactics in tactical theorem proving augmented with preconditions and 
effects, called premises and conclusions , respectively. A method represents the 
inference of the conclusion from the premises. For instance, Notl-M is a method 
whose purpose is to prove a goal r h ->P by contradiction. If Notl-M is 
applied to a goal r b ->P then it closes this goal and introduces the new 
goal to prove falsity, _L, under the assumption P, that is, P, P b_L. Thereby, 
P I — P is the conclusion of the method, whereas P, P FT is the premise of the 
method. Notl-M is a backward method, which reduces a goal (the conclusion) 
to new goals (the premises). Forward methods, in contrast, derive new conclu- 
sions from given premises. For instance, =Subst-m performs equality substitu- 
tions by deriving from two premises P h P[a } and P h a = b the conclusion 
P h P[b\ where an occurrence of a is replaced by an occurrence of b. Note that 
Notl-M and =Subst-m are simple examples of domain-independent, logic-related 
methods, which are needed in addition to domain-specific, mathematically mo- 
tivated methods. Knowledge base proof planning expands on these ideas and 
allows for more general mathematical methods to be encapsulated into methods. 

Control rules represent mathematical knowledge about how to proceed in 
the proof planning process. They can influence the planner’s behavior at choice 
points (e.g., which goal to tackle next or which method to apply next) by prefer- 
ring members of the corresponding list of alternatives (e.g., the list of possible 
goals or the list of possible methods) . This way promising search paths are pre- 
ferred and the search space can be pruned. 

Strategies employ different sets of methods and control rules and, thus, tackle 
the same problem in different ways. The reasoning as to which strategy to employ 
on a problem is an explicit choice point in Multi. In particular, Multi can 
backtrack from chosen strategies and search at the level of strategies. 

Detailed discussions of Iomega’s method and control rule language can be 
found in [Mei03,MMP02]. A detailed introduction to proof planning with mul- 
tiple strategies is given in [MM00] . 
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2.4 Interface and System Support 

Omega’s graphical user interface CflUX [SHB+99] displays the current proof 
state in multiple modalities: a graphical map of the proof tree, a linearized 
presentation of the proof nodes with their formulae and justifications, a term 
browser, and a natural language presentation of the proof via P.rex (see Fig. 4 
and 5). 




Fig. 4. Multi-modal proof presentation in the graphical user interface CflUT. 



When inspecting a part a proof, the user can switch between alternative levels 
of abstraction, for example, by expanding a node in the graphical map of the 
proof tree, which causes appropriate changes in the other presentation modes. 
Moreover, an interactive natural language explanation of the proof is provided by 
the system P.rex [Fie01b,Fie01a,Fie01c], which is adaptive in the following sense: 
it explains a proof step at the most abstract level (which the user is assumed 
to know) and then reacts flexibly to questions and requests, possibly at a lower 
level of abstraction, for example, by detailing some ill-understood subproof. 

Another system support is the guidance mechanism provided by the sugges- 
tion module fl Ants [BS98,BS99,BS00,Sor01], which searches pro-actively for 
possible actions that may be helpful in finding a proof and orders them in a 
preference list. Examples for such actions are an application of a particular cal- 
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Fig. 5. Natural language proof presentation by P.rex in CfiUT. 



cuius rule, the call of a tactic or a proof method as well as a call of an external 
reasoning system, or the search for and insertion of facts from the knowledge base 
MBase. The general idea is the following: every inference rule, tactic, method or 
external system is “agentified” in the sense that every possible action searches 
concurrently for the fulfillment of its application conditions and once these are 
satisfied it suggests its execution. User-definable heuristics select and display 
the suggestions to the user. 1?Ants is based on a hierarchical blackboard, which 
collects the data about the current proof state. 



2.5 Proof Objects 

The central data structure for the overall search is the proof plan data structure 
VVS in Fig. 2. This is a hierarchical data structure that represents a (partial) 
proof at different levels of abstraction (called partial proof plans). Technically, 
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it is an acyclic graph, where the nodes are justified by tactic applications. Con- 
ceptually, each such justification represents a proof plan (the expansion of the 
justification) at a lower level of abstraction, which is computed when the tactic 
is executed. In IOMEGA, we explicitly keep the original proof plan as well as inter- 
mediate expansion layers in an expansion hierarchy. The coexistence of several 
abstraction levels and the dynamical maintenance of their relationship is a cen- 
tral design objective of Iomega’s VVS. Thus the VVS makes the hierarchical 
structure of proof plans explicit and retains it for further applications such as 
proof explanation with P. rex or an analogical transfer of plans. The lowest level 
of abstraction of a VVS is the ND calculus. 

The proof object generated by 1?mega for example for the “irrationality of 
y/2” theorem is recorded in a technical report [BFMP02] , where the unexpanded 
and the expanded proof objects are presented in great detail, that is in a little 
less than a thousand proof steps. A general presentation of this interesting case 
study is [SBF+03]. 

2.6 Case Studies 

Early developments of proof planning in Alan Bundy’s group at Edinburgh used 
proofs by induction as their favorite case studies [Bun88]. The IOMEGA system 
has been used in several other case studies, which illustrate in particular the 
interplay of the various components, such as proof planning supported by het- 
erogeneous external reasoning systems. 

A typical example for a class of problems that cannot be solved by traditional 
automated theorem provers is the class of e-<5-proofs [MS99,Mel98a]. This class 
was originally proposed by Woody Bledsoe [Ble90] and it comprises theorems 
such as LIM+ and LIM*, where LIM+ states that the limit of the sum of two 
functions equals the sum of their limits and LIM* makes the corresponding 
statement for multiplication. The difficulty of this domain arises from the need 
for arithmetic computation in order to find a suitable instantiation of free (ex- 
istential) variables (such as a S depending on an e). Crucial for the success of 
Iomega’s proof planning is the integration of suitable experts for these tasks: the 
arithmetic computation is done by the computer algebra system Maple, and 
an appropriate instantiation for <5 is computed by the constraint solver CoSIE . 
We have been able to solve all challenge problems suggested by Bledsoe and 
many more theorems in this class taken from a standard textbook on real anal- 
ysis [BS82]. 

Another class of problems we tackled with proof planning is concerned with 
residue classes [MPS02,MPS01]. In this domain we show theorems such as: 
“the residue class structure (Z 5 , +) is associative” , “it has a unit element” , 
and similar properties, where Z 5 is the set of all congruence classes modulo 5 
{05,15,25,35,45} and + is the addition on residue classes. We have also inves- 
tigated whether two given structures are isomorphic or not and altogether we 
have proved more than 10,000 theorems of this kind (see [SorOl]). Although the 
problems in this domain are still within the range of difficulty a traditional auto- 
mated theorem prover can handle, it was nevertheless an interesting case study 
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for proof planning, since multi-strategy proof planning generated substantially 
different proofs based on entirely different proof ideas. 

Another important proof technique is Cantor’s diagonalization technique and 
we also developed methods and strategies for this class [CS98] . Important theo- 
rems we have been able to prove are the undecidability of the halting problem 
and Cantor’s theorem (cardinality of the set of subsets), the non-countability 
of the reals in the interval [0, 1] and of the set of total functions, and similar 
theorems. 

Finally, a good candidate for a standard proof technique are completeness 
proofs for refinements of resolution, where the theorem is usually first shown 
at the ground level using the excess-literal-number technique and then ground 
completeness is lifted to the predicate calculus. We have done this for many 
refinements of resolution with .Omega [Geb99] . 

However, Iomega’s main aim is to become a proof assistant tool for the 
working mathematician. Hence, it should support interactive proof development 
at a user-friendly level of abstraction. The mathematical theorem that \[2 is 
not rational, and its well-known proof dating back to the School of Pythagoras, 
provides an excellent challenge to evaluate whether this ambitious goal has been 
reached. In [Wie02] fifteen systems that have solved the v^-problem show their 
results. The protocols of their respective sessions have been compared on a multi- 
dimensional scale in order to assess the “naturalness” by which real mathematical 
problems of this kind can be proved within the respective system. 

This represents an important shift of emphasis in the field of automated 
deduction away from the somehow artificial problems of the past - as repre- 
sented, for example, in the test set of the TPTP library [SSY94] back to real 
mathematical challenges. 

We participated in this case study essentially with three different contri- 
butions. Our initial contribution was an interactive proof in Omega without 
adding special domain knowledge to the system. For further details on this case 
study, which particularly demonstrates the use of .Omega as a tactical theorem 
prover, we refer to [BFMP02]. The most important albeit not entirely new les- 
son to be learned from this experiment is that the level of abstraction common 
in most automated and tactical theorem proving environments is far too low. 
While our proof representation is already an abstraction (called the assertion 
level in [Hua94]) from the calculus level typical for most ATPs, it is nevertheless 
clear that as long as a system does not hide all these excruciating details, no 
working mathematician will feel inclined to use such a system. In fact, this is in 
our opinion one of the critical impediments for using first order ATPs and one, 
albeit not the only one, of the reasons why they are not used as widely as, say, 
computer algebra systems. 

This is the crucial issue of the Omega project and our main motivation 
for departing from the classical paradigm of automated theorem proving about 
fifteen years ago. 

Our second contribution to the case study of the \/2-problem is based on 
interactive island planning [Mel96], a technique that expects an outline of the 
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proof, i.e. the user provides main subgoals, called islands , together with their 
assumptions. The details of the proof, eventually down to the logic level, are 
postponed. Hence, the user can write down his proof idea in a natural way with 
as many gaps as there are open at this first stage of the proof. Closing the gaps 
is ideally fully automatic, in particular, by exploiting external systems. However, 
for difficult theorems it is necessary more often than not that the user provides 
additional information and applies the island approach recursively. 

In comparison to our first tactic-based solution the island style supports a 
much more abstract and user-friendly interaction level. The proofs are now at a 
level of abstraction similar to proofs in mathematical textbooks. 

Our third contribution to the case study of the v^-problem is a fully auto- 
matically planned and expanded proof of the theorem. The details of this very 
important case study, that shows best what (and what cannot) be achieved with 
current technology are presented in [SBF+03], [SBF+02], and [BFMP02]. 

The most important question to ask is: Can we find the essential and creative 
steps automatically? The answer is yes, as we have shown in [SBF+03]. However, 
while we can answer the question in the affirmative, not every reader may be 
convinced, as our solution touches upon a subtle point, which opens the Pandora 
Box of critical issues in the paradigm of proof planning [Bun02] : It is always easy 
to write some specific methods, which perform just the steps in the interactively 
found proof and then call the proof planner Multi to fit the methods together 
into a proof plan for the given problem. This, of course, shows nothing of sub- 
stance: Just as we could write down all the definitions and theorems required for 
the problem in first order predicate logic and hand them to a first order prover 5 , 
we would just hand-code the final solution into appropriate methods. 

Instead, the goal of the game is to find general methods for a whole class of 
theorems within some theory that can solve not only this particular problem, 
but also all the other theorems in that class. While our approach essentially 
follows the proof idea of the interactively constructed proof for the \[2- problem, 
it relies essentially on more general concepts such that we can solve, for example, 
•^7-problems for arbitrary natural numbers j and l. 

However, this is certainly not the end of the story; in order to evaluate the 
appropriateness of a proof planning approach we suggest the following three 
criteria: 

(1) How general and how rich in mathematical content are the methods and 

control rules? 

(2) How much search is involved in the proof planning process? 

(3) What kind of proof plans, that is, what kind of proofs, can we find? 

These criteria should allow us to judge how general and how robust our 
solution is. The art of proof planning is to acquire domain knowledge that, 
on the one hand, comprises meaningful mathematical techniques and powerful 
heuristic guidance, and, on the other hand, is general enough to tackle a broad 

5 This was done when Otter tackled the \/2-problem; see [Wie02] for the original 
Otter case study and [BFMP02] for its replay with J?mega. 
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class of problems. For instance, as one extreme, we could have methods that 
encode Iomega’s ND calculus and we could run Multi without any control. 
This approach would certainly be very general, but Multi would fail to prove 
any interesting problems. As the other extreme case, we could cut a known proof 
into pieces, and code the pieces as methods. Guided by control rules that always 
pick the next right piece of the proof, Multi would assemble the methods again 
to the original proof without performing any search. 

The amount of search and the variety of potential proof plans for a given 
problem are measures for the generality of the methods and also for the appro- 
priateness for tackling the class of problems by planning. If tight control rules or 
highly specific methods restrict the search to just one branch in the search tree, 
then the resulting proof plans will merely instantiate a pattern. In this case, a 
single tactic or method that realizes the proof steps of the underlying pattern is 
more suitable than planning. The possibility of creating a variety of proof plans 
with the given methods and control rules is thus an important feature. 

What general lessons can we learn from small, albeit typical mathematical 
challenges of this kind? 

1. The devil is in the detail, that is, it is always possible to hide the crucial 
creative step (represented as method or represented in the object language 
by an appropriate lemma) and to pretend a level of generality that has not 
actually been achieved. To evaluate a solution all tactics, methods, theorems, 
lemmata and definitions have to be made explicit. 

2. The enormous distance between the well-known (top-level) proof of the 
Pythagorean School, which consists of about a dozen single proof steps in 
comparison to the final (non-optimized) proof at the ND level with 753 
inference steps is striking. This is, of course, not a new insight. While math- 
ematics can in principle be reduced to purely formal logic-level reasoning 
as demonstrated by Russell and Whitehead as well as the Hilbert School, 
nobody would actually want to do so in practice as the influential Bourbaki 
group showed: only the first quarter of the first volume in the several dozen 
volume set on the foundation of mathematics starts with elementary, logic- 
level reasoning and then proceeds with the crucial sentence [Bou68]: “No 
great experience is necessary to perceive that such a project [of complete 
formalization] is absolutely unrealizable: the tiniest proof at the beginning 
of the theory of sets would already require several hundreds of signs for its 
complete formalization.” 

3. Finally and more to the general point of interest in mathematical support 
systems: Now that we can prove theorems in the ^-problem class, the skep- 
tical reader may still ask: So what? Will this ever lead to a general system 
for mathematical assistance? 

We have shown that the class of e-(5-proofs for limit theorems can indeed 
be solved with a few dozen mathematically meaningful methods and control 
rules (see [MS99,Mel98b,Mei03]). Similarly, the domain of group theory with 
its class of residue theorems can be formalized with even fewer methods 
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(see [MS00,MPS01,MPS02]) 6 . An interesting observation is also that these 
methods by and large correspond to the kind of mathematical knowledge a 
freshman would have to learn to master this level of professionalism. 

Do the above observations now hold for our '(//-problems? The unfortunate 
answer is probably No! Imagine the subcommittee of the United Nations in 
charge of the maintenance of the global mathematical knowledge base in a hun- 
dred years from now. Would they accept the entry of our methods, tactics and 
control rules for the '(//-problems? Probably not! 

Factual mathematical knowledge is preserved in books and monographs, but 
the art. of doing mathematics [Pol73,Had44] is passed on by word of mouth from 
generation to generation. The methods and control rules of the proof planner cor- 
respond to important mathematical techniques and “ways to solve it” , and they 
make this implicit and informal mathematical knowledge explicit and formal. 

The theorems about -(//-problems are shown by contradiction, that is, the 
planner derives a contradiction from the equation / • n 3 = m 3 , where n and m are 
integers with no common divisor. However, these problems belong to the more 
general class to determine whether two complex mathematical objects X and 
y are equal. A general mathematical principle for comparison of two complex 
objects is to look at their characteristic properties, for example, their normal 
forms or some other uniform notation in the respective theory. 

And this is the crux of the matter: to find general mathematical principles 
and encode them into appropriate methods, control rules and strategies such 
that an appropriately large class of problems can be solved with these methods. 

We are now working on formalizing these methods in more general terms and 
then instantiate them with appropriate parameters to the domain in question 
(number theory, set theory, or polynomial rings) - and the crucial creative step 
of the system Multi is then to find the instantiation by some general heuristics. 

3 The Future: What Next? 

The vision of a powerful mathematical assistance environment which provides 
computer-based support for most tasks of a mathematician has stimulated new 
projects and international research networks across the disciplinary and systems 
boundaries. Examples are the European Calculemus 7 (Integration of Symbolic 
Reasoning and Symbolic Computation) and Mkm 8 (Mathematical Knowledge 
Management, [BGH03]) initiatives, the EU projects Monet 9 , Openmath and 
Mowgli 10 , and the American Qpq 11 repository of deductive software tools. 

6 The generally important observation is not, of course, whether we need a dozen or a 
hundred methods, but that we don’t need a few thousand or a million. A few dozen 
methods seem to be generally enough for a restricted mathematical domain. 

7 www.calculemus.org 

8 monet . nag . co . uk/mkrn/ index. ht ml 

9 monet.nag.co.uk/cocoon/monet/index.html 
www.mowgli.cs.unibo.it/ 
www.qpcj.org 
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Furthermore there are now numerous national projects in the US and Europe, 
which cover partial aspects of this vision, such as knowledge representation, 
deductive system support, user interfaces, mathematical publishing tools, etc. 

The longterm goal of the 1?MEGA project is the all-embracing integration of 
symbolic reasoning, i.e. computer algebra and deduction systems, into mathe- 
matical research, mathematics education, and formal methods in computer sci- 
ence. We anticipate that in the long run these systems will change mathematical 
practice and they will have a strong societal impact, not least in the sense that 
a powerful infrastructure for mathematical research and education will become 
commercially available. Computer supported mathematical reasoning tools and 
integrated assistance systems will be further specialized to have a strong impact 
also in many other theoretical fields such as safety and security verification of 
computer software and hardware, theoretical physics and chemistry and other 
related subjects. 
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Fig. 6. Mathematical Creativity Spiral; [Buchberger, 1995]. 



Our current approach is strictly bottom-up: Starting with existing techniques 
and tools of our partners for symbolic reasoning (deduction) and symbolic com- 
putation (computer algebra), we will step by step improve their interoperability 
up to the realization of an integrated systems via the mathematical software 
bus MathWeb-SB. The envisaged system will support the full life-cycle of the 
evolutionary nature of mathematical research (see Fig. 6) helping an engineer 
or mathematician who works on a mathematical problem in the improvement, 
the exploration, the distributed maintenance, the retrieval and the proving and 
calculation tasks and finally the publication of mathematical theories. 

So what does this vision entail in the immediate future? 
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3.1 Formalization and Proving at a Higher Level of Abstraction 

Mathematical reasoning with the 1?mega system is at the comparatively high 
level of abstraction of the proof planning methods. However, as these meth- 
ods have to be expanded eventually to the concrete syntax of our higher order 
ND-calculus, the system still suffers from the effect and influence this logical rep- 
resentation has. In contrast, the proofs developed by a mathematician, say for 
a mathematical publication, and the proofs developed by a student in a mathe- 
matical tutoring system are typically developed at an argumentative level. This 
level has been formally categorized as proofs at the assertion level [Hua94] with 
different types of under- specification [ABF+03] 12 . The CoRe system [Aut03] 
has been designed to achieve this and the goal is now to completely exchange 
the current natural deduction calculus by the CoRe calculus. 

The proposed exchange of the logic layer in Iomega requires the adaptation 
of all reasoning procedures that are currently tailored to it, including proof 
planning and the integration of external systems. 



3.2 J?Ants: Agent-Oriented Theorem Proving 

Our agent-based suggestion and reasoning mechanism is called !?Ants [BSOO], 
whose initial motivation is to turn the hitherto passive f?MEGA system into a 
pro-active counter-player of the user which autonomously exploits available re- 
sources. It provides societies of pro-active agents organized via an hierarchical 
blackboard architecture that dynamically and concurrently generate suggestions 
on applicable proof operators. These 1?Ants agents may also call external sys- 
tems or perform search for data in mathematical knowledge bases (see [BMS04]). 

We will now provide improved higher order theorem proving agents based 
on the provers CEO [BK98] and TPS [ABBOO], which analyze the proof context 
and determine promising “control settings” . These higher order proof agents will 
work in competition with traditional first order proof agents and other “agenti- 
fied” reasoning systems. 

3.3 Mathematical Knowledge Representation 

A mathematical proof assistant relies upon different kinds of knowledge: first, of 
course, the formalized mathematical domain as organized in structured theories 
of definitions, lemmata, and theorems. Secondly, there is mathematical knowl- 
edge on how to prove a theorem, which is encoded in tactics and methods, in 
Q Ants agents, in control knowledge and in strategies. This type of knowledge 
can be general, theory specific or even problem specific. 

The integration of a mathematical proof assistant into the typical and ev- 
eryday activities of a mathematician requires however other types of knowledge 

12 “Under-specification” is a technical term borrowed from research on the semantics of 
natural language. Illustrating examples and a discussion of our notion can be found 
in [ABF+03,BFG+03]. 
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as well. For example, a mathematical tutoring system for students relies upon a 
database with different samples of proofs and proof plans linked by meta-data 
in order to advise the student. Another example is the support for mathemati- 
cal publications: the documents containing both formalized and non-formalized 
parts need to be related to specific theories, lemmas, theorems, and proofs. 
This raises the research challenge on how the usual structuring mechanisms 
for mathematical theories (such as theory hierarchies or the import of theories 
via renaming or general morphisms) can be extended to tactics and methods 
as well as to proofs, proof plans and mathematical documents. Furthermore, 
changing any of these elements requires maintenance support as any change in 
one part may have consequences in other parts. For example, the validity of 
a proof needs to be checked again after changing parts of a theory, which in 
turn may affect the validity of the mathematical documents. This management 
of change [AHMS02,AM02,AH02,Hut00,MAH01], originally developed for evo- 
lutionary formal software engineering at the DFKI, will now be integrated into 
the i?MEGA system as well. 

Hierarchically structured mathematical knowledge, i.e. an ontology of math- 
ematical theories and assertions has initially been stored in Cmegas hardwired 
mathematical knowledge base. This mathematical knowledge base was later (end 
of the 90s) out-sourced and linked to the development of MBase [FKOOb]. We 
now assume that a mathematical knowledge base also maintains domain specific 
control rules, strategies, and linguistic knowledge. While this is not directly a 
subject of research in the i?MEGA project, relying here on other groups of the 
MKM community and hence on the general development of a worldwide math- 
ematical knowledge base (“the Semantic Web for Mathematicians”), we shall 
nevertheless concentrate on one aspect, namely how to find the appropriate in- 
formation. 

Semantic Mediators for Mathematical Knowledge Bases. Knowledge 
acquisition and retrieval in the currently emerging large repositories of formalized 
mathematical knowledge should not be based purely on syntactic matching, 
but it needs to be supported by semantic mediators, which suggest applicable 
theorems and lemmata in a given proof context. 

We are working on appropriately limited HOL reasoning agents for clomain- 
and context-specific retrieval of mathematical knowledge from mathematical 
knowledge bases. For this we shall adapt a two stage approach as in [BMS04], 
which combines syntactically oriented pre-filtering with semantic analysis. The 
pre-filter employ efficiently processable criteria based on meta-data and ontolo- 
gies that identify sets of candidate theorems of a mathematical knowledge bases 
that are potentially applicable to a focused proof context. The HOL agents act 
as post-filters to exactly determine the applicable theorems of this set. Exact 
semantic retrieval includes the following aspects: (i) logical transformations to 
see the connection between a theorem in a mathematical knowledge base and 
a focused subgoal. Consider, e.g., a theorem of the form A •£=>■ B in the math- 
ematical knowledge base and a subgoal of the form {A => B) A (~<A =» ~^B)- 
they are not equal in any syntactical sense, but they denote the same assertion. 
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(ii) The variables of a theorem in a mathematical knowledge base may have 
to be instantiated with terms occurring in a focused subgoal; consider, e.g., a 
theorem \/X.is—square(X x X) and the subgoal is—square( 2 x 2). (iii) Free 
variables (meta- variables) may occur in a focused subgoal and they may have to 
be instantiated with terms occurring in a theorem of the mathematical knowl- 
edge base; consider, e.g., a subgoal irrational(X) with metavariable A' and a 
theorem irrational (V 2). 

We are investigating whether this approach can be successfully coupled with 
state-of-the-art search engines such as Google. 

3.4 VerMath: A Global Web for Mathematical Services 

The Internet provides a vast collection of data and computational resources. For 
example, a travel booking system combines different information sources, such 
as the search engines, price computation schemes, and the travel information in 
distributed very large databases, in order to answer complex booking requests. 
The access to such specialized travel information sources has to be planned, the 
obtained results combined and, in addition the consistency of time constraints 
has to be guaranteed. 

We want to transfer and apply this methodology to mathematical problem 
solving and develop a system that plans the combination of several mathematical 
information sources (such as mathematical databases), computer algebra sys- 
tems, and reasoning processes (such as theorem provers or constraint solvers). 
Based on the well-developed MathWeb-SB network of mathematical services, 
the existing client-server architecture will be extended by advanced problem 
solving capabilities and semantic brokering of mathematical services. 

The reasoning systems currently integrated in MathWeb-SB have to be 
accessed directly via their API, thus the interface to MathWeb-SB is system- 
oriented. However, these reasoning systems are used also in applications that 
are not necessarily theorem provers, e.g. for the semantical analysis of natural 
language, small verification tasks, etc. The main goal of this project 13 is therefore 
twofold: 

Problem-Oriented Interface: to develop a more abstract communication 
level for MathWeb-SB, such that general mathematical problem descrip- 
tions can be sent to the MathWeb-SB which in turn returns a solution to 
that problem. Essentially, this goal is to move from a service oriented inter- 
face to a problem oriented interface for the MathWeb-SB. This is a very old 
idea in the development of AI programming languages (early work included 
Planner and other languages driven by matching of general descriptions). 
Advanced Problem Solving Capabilities: Typically, a given problem can- 
not be solved by a single service but only by a combination of several services. 
In order to support the automatic selection and combination of existing ser- 
vices, the key idea is as follows: an ontology will be used for the qualitative de- 

13 This is a joint project between the University of Saarbriicken (Jorg Siekmann) and 
the International University Bremen (Michael Kohlhase). 
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scription of MathWeb-SB services and these descriptions will then be used 
as AI planning operators , in analogy to todays proof planning approach. We 
can then use planning techniques [CBE + 92,EHN94] to automatically gener- 
ate a plan that describes how existing services must be combined to solve a 
given mathematical problem. 

3.5 Publishing Tools for Mathematics 

Proof construction is an important but only a small part of a much wider range of 
mathematical activities an ideal mathematical assistant system should support 
(see Fig. 1). Therefore the IOMEGA system is currently extended to support 
the writing of mathematical publications and advising students during proof 
construction. 

With respect to the former we envision that a mathematician writes a new 
paper in some specific mathematical domain using a LaTeX-like environment. 
The definitions, lemmas, theorems and especially their proofs give rise to exten- 
sions of the original theory and the writing of some proof goes along with an 
interactive proof construction in IOMEGA. As a result this allows the development 
of mathematical documents in a publishable style which in addition are formally 
validated by IOMEGA, hence obtaining certified mathematical documents. A first 
step in that direction is currently under development by linking the WYSIWYG 
mathematical editor TeXmacs [vdHOl] with the 1?MEGA proof assistant and 
other mathematical support services (see Fig. 7) 
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Fig. 7. Semantical documents in TeXmacs: The user will be supported in by different 
dynamic mathematical reasoning services that “understand” the document content. 



The TEXMACS-system provides LaTeX-like editing and macro-definition fea- 
tures, and we are defining macros for theory-specific knowledge such as types, 
constants, axioms, and lemmata. This allows us to translate new textual defini- 
tions and lemmas into the formal representation, as well as to translate (partial) 
textbook proofs into (partial) proof plans. 
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As a second activity we are involved in the DFKI project ActiveMath, which 
develops an e-learning tool for tutoring students, in particular in advising a 
student to develop a proof. Thereby the interaction with the student should be 
conducted via a textual dialog. This scenario is currently under investigation in 
the Dialog project [BFG + 03] and, aside from all linguistic analysis problems, 
gives rise to the problem of under-specification in proofs. 
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Abstract. We present here an overview of several planning techniques 
in robotics. We will not be concerned with the synthesis of abstract 
mission and task plans, using well known classical and other domain- 
independent planning techniques. We will mainly focus on to how refine 
such abstract plans into robust sensory-motor actions and on some plan- 
ning techniques that can be useful for that. 

The paper introduces the important and mature area of path and mo- 
tion planning. It illustrates the usefulness of HTN and MDP planning 
techniques for the design of a high level controller for a mobile robot 1 . 



1 Introduction 

A robot integrates several sensory- motor functions, together with communica- 
tion and information-processing capabilities into cognitive functions, in order to 
perform a collection of tasks with some level of autonomy and flexibility, in some 
class of environments. The sensory-motor functions in a robot are, for example: 

• locomotion on wheels, legs, or wings, 

• manipulation with one or several mechanical arms, grippers and hands, 

• localization with odomoters, sonars, laser, inertial and GPS sensors, 

• scene analysis and environment modeling with a stereo-vision system on a 
pan-and-tilt platform. 

A robot can be designed for tasks and environments such as: 

• manufacturing: painting, welding, loading/unloading a power-press or a 
machine-tool, assembling parts, 

• servicing a store, a warehouse or a factory: maintaining, surveying, or cleaning 
the area, transporting objects, 

• exploring an unknown natural area, e.g., in planetary exploration: building 
a map with characterized landmarks, extracting samples and setting various 
measurement devices, 

• assisting a person in an office, a public area, or at home, 

• performing tele-operated surgical operations, as in the so-called minimal in- 
vasive surgery. 

Robotics is a reasonably mature technology when, for example 
1 This article is based on a revised material from the Chapter 20 in [17]. 
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• a robot is restricted to operate within a well known and well engineered envi- 
ronments, e.g., as in manufacturing robotics, 

• a robot is restricted to perform a single simple task, e.g., vacuum cleaning or 
lawn mowing. 

For more diverse tasks and open-ended environments, robotics remains a very 
active research field. 

A robot may or may not integrate planning capabilities. For example, most 
of the one million manufacturing robots deployed today in the manufacturing 
industry do not perform planning per se. Using a robot without planning capa- 
bilities basically requires hand-coding the environment model, and the robot’s 
skills and strategies into a reactive controller. This is a perfectly sensible ap- 
proach as long as this handcoding is inexpensive and reliable enough for the 
application at hand, which is the case if the environment is well-structured and 
stable and if the robot’s tasks are restricted in scope and diversity, with only a 
limited man-robot interaction. 

Programming aids such as hardware tools, e.g., devices for memorizing the 
motion of a pantomime, and software systems, e.g., graphical programming in- 
terfaces, allow for an easy development of a robot’s reactive controller. Learning 
capabilities, supervised or autonomous, significantly extend the scope of appli- 
cability of the approach by allowing a generic controller to adapt to the specifics 
of its environment. This can be done, for example, by estimating and fine-tuning 
control parameters and rules, or by acquiring a map of the environment. 

However, if a robot has to face a diversity of tasks and/or a variety of envi- 
ronments, then planning will make it simpler to program a robot. It will augment 
the robot’s usefulness and robustness. Planning should not be seen as opposed 
to the reactive capabilities of a robot, lrandcoded or learned, neither should it 
be seen as opposed to its learning capabilities. It should to be closely integrated 
to them. 

The specific requirements of planning in robotics, as compared to other ap- 
plication domains of planning, are mainly the need to handle: 

• online input from sensors and communication channels; 

• heterogeneous partial models of the environment and of the robot, as well as 
noisy and partial knowledge of the state from information acquired through 
sensors and communication channels; 

• direct integration of planning with acting, sensing, and learning. 

These very demanding requirements advocate for addressing planning in 
robotics through domain-specific representations and techniques. Indeed, when 
planning is integrated within a robot, it usually takes several forms and is im- 
plemented throughout different systems. Among these various forms of robot 
planning, there is in particular path and motion planning , perception planning, 
navigation planning , manipulation planning, and domain independent planning. 

Today, the maturity of robot planning is mainly at the level of its domain- 
specific planners. Path and motion planning is a mature area that relies on com- 
putational geometry and efficiently uses probabilistic algorithms. It is already 
deployed in robotics and other application areas such as CAD or computer an- 
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imation. Perception planning is a younger and much more open area, although 
some focused problems are well advanced, e.g. the viewpoint selection problem 
with mathematical programming techniques. 

Domain-independent planning is not widely deployed in robotics for various 
reasons, among which are the restrictive assumptions and expressiveness of the 
classical planning framework. In robotics, task planning should ideally deal with 
time and resource allocation, dynamic environments, uncertainty and partial 
knowledge, and incremental planning with consistent integration to acting and 
sensing. The mature planning techniques available today are mostly effective at 
the abstract level of mission planning. Primitives for these plans are tasks such 
as “navigate to location5”, “retrieve and pick-up object2”. These tasks are far from 
being primitive sensory-motor functions. Their design is very complex. 

Several rule-based or procedure-based systems, such as PRS, RAP, Propice, 
or SR.Cs, enable to program manually closed-loop controllers for these tasks that 
handle the uncertainty and the integration between acting and sensing. These 
high level reactive controllers permit preprogrammed goal-directed and event- 
reactive modalities. 

However, planning representations and techniques can also be very helpful 
for the design of high-level reactive controllers performing these tasks. They 
enable to generate, off-line, several alternative complex plans for achieving the 
task with robustness. They are useful for finding a policy that chooses, in each 
state, the best such a plan for pursuing the activity. 

The rest of this article presents the important and mature area of path and 
motion planning (Section 2). It then illustrates the usefulness of planning tech- 
niques, for the design of a high level navigation controller for a mobile robot 
(Section 3). The approach is not limited to navigation tasks. It can be pursued 
for a wide variety of robotics tasks, such as object manipulation or cleaning. 
Several sensory-motor functions will be presented and discussed in Section 3.1; 
an approach that exemplifies the use of planning techniques for synthesizing al- 
ternative plans and policies for a navigation task is described. The last section 
refers to more detailed and focused descriptions of the techniques presented in 
this overview. 

2 Path and Motion Planning 

Path planning is the problem of finding a feasible geometric path in some environ- 
ment for moving a mobile system from a starting position to a goal position. A 
geometric CAD model of the environment with the obstacles and the free space 
is supposed to be given. A path is feasible if it meets the kinematics constraints 
of the mobile system and if it avoids collision with obstacles. 

Motion planning is the problem of finding a feasible trajectory, in space and 
time, i.e. , a feasible path and a control law along that path that meets the 
dynamics constraints (speed and acceleration) of the mobile system. If one is 
not requiring an optimal trajectory, it is always possible to label temporally a 
feasible path in order to get a feasible trajectory. Consequently, motion planning 
relies on path planning, on which we focus the rest of this section. 




32 



Malik Ghallab 





Fig. 1. Hilare, a car like robot with an arm and a trailer (left); HRP, a humanoid robot 
(right). 



If the mobile system of interest is a free-flying rigid body, i.e., if it can 
move freely in space in any direction without any kinematics constraint, then six 
configuration parameters are needed to characterize its position: x. y , z and the 
three Euler angles. Path planning defines a path in this six-dimensional space. 
However, a robot is not a free-flying body. Its kinematics defines its possible 
motion. For example, a car-like robot has three configuration parameters, x, y, 
and 0. Usually these three parameters are not independent, e.g., the robot may 
or may not be able to turn on the spot (change 6 while keeping x and y fixed), or 
be able to move sideway. A mechanical arm that has n rotational joins needs n 
configuration parameters to characterize its configuration in space, in addition to 
constraints such as the maximum and minimum values of each angular join. The 
car-like robot Hilare in Figure 1 (left) has a total of 10 configuration parameters: 
6 for the arm and 4 for the mobile platform with the trailer [34] . The humanoid 
robot HRP in Figure 1 (right) has 52 configuration parameters: 2 for the head, 7 
for each arm, 6 for each leg and 12 for each hand (four finger with 3 configuration 
parameters each) [26, 27] 2 . 

Given a robot with n configuration parameters and some environment, let us 
define: 

• g, the configuration of the robot: an n-tuple of reals that specifies the n pa- 
rameters needed to characterize the position in space of the robot, 

• CS 7 the configuration space of the robot: the set of values that its configuration 
q may take. 



2 The degrees of freedom of a mobile system are its control variables; an arm or the 
humanoid robot have as many degrees of freedom as configuration parameters, a 
car-like robot has 3 configuration parameters but only two degrees of freedom. 
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• C S fre e -> the free configuration space: the subset of CS of configurations that 
are not in collision with the obstacles of the environment. 

Path planning is the problem of finding a path in the free configuration space 
C S free between an initial and a final configuration. If one could compute CSf ree 
explicitly, then path planning would be a search for a path in this n-dimensional 
continuous space. However, the explicit definition of CSf ree is a computation- 
ally difficult problem, theoretically (it is exponential in the dimension of CS) 
and practically. Fortunately, very efficient probabilistic techniques have been de- 
signed that solve path planning problems even for highly complex robots and 
environments. They rely on the two following operations: 

• collision checking , which checks whether a configuration q £ CSfree, or 
whether a path between two configurations in CS is collision free, i.e. , if it 
lies entirely in CSf re e , 

• kinematic steering which finds a path between two configurations q and q' in 
CS that meets the kinematic constraints, without taking into account obsta- 
cles. 

Both operations can be performed efficiently. Collision checking relies on com- 
putational geometry algorithms and data structures [19]. Kinematic steering may 
use one of several algorithms, depending on the type of kinematics constraints 
the robot has. For example, Manhattan paths are applied to systems that are 
required to move only one configuration parameter at a time. Special curves 
(called Reed&Shepp curves [43] ) are applied to car-like robots that cannot move 
sideway. If the robot has no kinematic constraints, then straight line segments in 
CS from q to q' are used. Several such algorithms can be combined. For exam- 
ple, to plan paths for the robot Hilare in Figure 1 (left), straight line segments 
for the arm are combined with dedicated curves for the mobile platform with a 
trailer [34]. 

Let £(g, q') be the path in CS computed by the kinematics steering algorithm 
for the constraints of the robot of interest; C is assumed to be symmetrical. 

Let TZ be a graph whose vertices are configurations in CSf ree ', two vertices q 
and q' are adjacent in 1Z only if C(q, q') is in CSf re e- TZ is called a roadmap for 
CSfree- 

Since C is symmetrical, 7 Z is an undirected graph. Note that every pair of 
adjacent vertices in 7 Z is connected by a path in CSf ree but the converse is not 
necessarily true. Given a roadmap for CSf ree and two configuration qi and q g in 
CSfree, corresponding to an initial and goal configurations, a feasible path from 
qi to q g can be found as follows: 

• find a configuration q[ £ TZ such that £(gi, g() £ CSf ree , 

• find a configuration q' g £ TZ such that C(q g ,q g ) € CSf re e, 

• find in 7 Z a sequence of adjacent configurations from g' to q' g . 

If these three steps succeed, then the planned path is the finite sequence of sub- 
paths jC(qi,q'i), . . . ,£(q' g ,q g ). In a post-processing step, this sequence is easily 
optimized and smoothed locally by finding shortcuts in CSfree between succes- 
sive legs. 
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Given a roadmap TZ , path planning is reduced to a simple graph search 
problem, in addition to collision checking and kinematics steering operations. 
There remains the problem of finding a roadmap that covers CSf ree , i.e., when- 
ever there is a path in CSf ree between two configurations, there is also a path 
through the roadmap. Finding such a roadmap using probabilistic techniques 
turns out to be easier than computing CSf ree explicitly. 

Let us define the coverage domain of a configuration q to be the set: 

V(q) = {q' £ CS free \jC(q,q') C CS free }. 

A set of configurations Q covers CSf ree if: 

U v(q) = CS free . 

96 Q 

The algorithm Probabilistic-Roadmap (Figure 2) starts initially with an empty 
roadmap. It generates randomly a configuration q £ CSf ree \ q is added to the 
current roadmap TZ iff either: 

• q extends the coverage of TZ , i.e., there is no other configuration in TZ whose 
coverage domain includes <7, or 

• q extends the connexity of 7 Z, i.e., q enables to connect two configurations in 
TZ that are not already connected in 7 Z. 



Probabilistic-Roadmap( 7 ?.) 

iterate unt\\(termination condition) 

draw a random configuration q in CSf ree 
if V9' € 7 Z: C(q,q r ) <(_ CSf ree then add q to 1Z 
else if there are qi and 92 unconnected in 7 Z such that 
£( 9 * 9 1) G CSf ree and £(<7,(72) C CSf ree 
then add q and the edges (9,91) and (<7,(72) to 7 Z 
end iteration 
return(T^) 



Fig. 2 . A probabilistic roadmap generation for path planning. 

Let us assume that there is a finite set Q that covers CSf ree 3 . Consider the 
roadmap 7 Z that contains all the configurations in Q, and, for every pair <7! and 
<72 in Q such that TT(qi) and D(g 2 ) intersect, 7 Z also contains a configuration 
q £ 77(91) fl 7?(<7i) and the two edges (9,(71) and (9,92)- It is possible to show 
that 7 Z meets the following property: if there exists a feasible path between two 
configurations 9; and q g in CSf ree , then there are two configurations 9' and q' g in 

3 Depending on the shape of CSfree and the kinematics constraints handled in C, 
there may or may not exist such a finite set of configurations that covers CSfree 
[29]. 
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the roadmap 7Z such that q- t £ , q g £ V(q g ), and q[ and q' g are in the same 

connected component of TZ. Note that the roadmap may have several connected 
components that reflects those of CSf ree - 

The Probabilistic-Roadmap algorithm will not generate a roadmap that meets 
the above property deterministically , but only up to some probability value, 
which is linked to the termination condition. Let k be the number of random 
draws since the last draw of a configuration q that has been added to the roadmap 
because q extends the coverage of the current 7 Z (q meets the first if clause in 
Figure 2). The termination condition is to stop when k reaches a preset value 
kmax • It has been shown that 1 /k max is a probabilistic estimate of the ratio 
between the part of CSfree not covered by TZ to the total CSfree- In other 
words, for k max = 1000 the algorithm generates a roadmap that covers CSfree 
with a probability of .999. 

From a practical point of view, the probabilistic roadmap technique illus- 
trated by the previous algorithm has led to some very efficient implementations 
and to marketed products used in robotics, computer animation, CAD and man- 
ufacturing applications. Typically, for a complex robot and environment, and 
kmax in the order of few hundreds, it takes about a minute to generate a roadmap 
on a normal desktop machine; the size of TZ is about a hundred configurations; 
path planning with the roadmap takes few milliseconds. This is illustrated for 
the Hilare robot in Figure 3 where the task is to carry a long rod that constrains 
the path through the door: the roadmap in this 9-dimensional space has about 
100 vertices and is generated in less than one minute. The same techniques have 
also been successfully applied to manipulation planning problems. 

3 Planning for the Design of a Robust Controller 

Consider an autonomous mobile robot in a structured environment, such as the 
robot in Figure 1 (left), which is equipped with several sensors - sonar, laser, 
vision - and actuators, and with an arm. The robot has also several software 
modules for the same sensory-motor ( sm ) function, e.g., for localization, for map 
building and updating, or for motion planning and control. These redundant 
sm functions are needed because of possible failures of a sensor, and because 
no single method or sensor has a universal coverage. Each has its weak points 
and drawbacks. Robustness requires a diversity of means for achieving an sm 
function. Robustness also requires the capability to combine consistently several 
such sm functions into a plan appropriate for the current context. 

The planning techniques described in this section illustrates this capability. 
They enables a designer to specify, off-line, very robust ways of performing a 
task such as “navigate to”. The designer specifies a collection of Hierarchical 
Tasks Networks, as illustrated in Figure 4, that are complex plans, called modes 
of behavior, or modalities for short 4 , whose primitives are sm functions. Each 
modality is a possible way of combining a few of these sm functions to achieve 
the desired task. A modality has a rich context-dependent control structure. 

4 Behaviors have generally in robotics a meaning different from our modalities. 
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Fig. 3. Initial and goal configurations (up left and right) of a path planning problem, 
and generated path (down). 



It includes alternatives whose selection depends on the data provided by sm 
functions. 

Several modalities are available for a given task. The choice of the right 
modality for pursuing a task is far from being obvious. However, the relationship 
between control states and modalities can be expressed as a Markov Decision 
Process. This MDP characterizes the robot abilities for that task. The probability 
and cost distributions of this MDP are estimated by moving the robot in the 
environment. The controller is driven by policies extracted on-line from this 
MDP. 

To summarize, this approach involves three components: 

• Sensory-motor functions, which are the primitive actions. 

• Modalities that are HTN plans. Alternate modalities offer different ways of 
combining the sm functions within a task, 

• MDPs whose policies are used by the controller to achieve the task. 

Let us describe these three levels successively. 
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3.1 Sensory- Motor Functions 

The sensory-motor functions illustrated here and the control system itself rely 
on a model of the environment learned and maintained by the robot. The basic 
model is a 2D map of obstacle edges acquired from the laser range data. The 
so-called Simultaneous Localization and Mapping (SLAM) technique is used to 
generate and maintain the map of the environment. 

A labeled topological graph of the environment is associated with the 2D 
map. Cells are polygons that partition the metric map. Each cell is characterized 
by its name and a color that corresponds to navigation features such as Corridor, 
Corridor with landmarks, Large Door, Narrow Door, Confined Area, Open Area, 
Open Area with fixed localization devices 5 . Edges of the topological graph are 
labeled by estimates of the transition length from one cell to the next and by 
heuristic estimates of how easy such a transition is. 

An sm function returns to the controller a report indicating either the end of a 
normal execution, or giving additional information about non-nominal execution. 
In order to give to the reader an idea of the “low level” primitives available on a 
robot, of their strong and weak points and how they can be used from a planning 
point of view, let us discuss some of these sm functions. 

Segment-Based Localization. This function relies on the map maintained by the 
robot from laser range data. The SLAM technique uses a data estimation ap- 
proach called Extended Kalman Filtering in order to match the local perception 
with the previously built model. It offers a continuous position-updating mode, 
used when a good probabilistic estimate of the robot position is available. This 
sm function estimates the inaccuracy of the robot localization. When the robot 
is lost, a re-localization mode can be performed. A constraint relaxation on the 
position inaccuracy extends the search space until a good matching with the 
map is found. 

This sm function is generally reliable and robust to partial occlusions, and 
much more precise than odometry. However, occlusion of the laser beam by ob- 
stacles gives unreliable data. This case occurs when dense unexpected obstacles 
are gathered in front of the robot. Moreover, in long corridors the laser obtains 
no data along the corridor axis. The inaccuracy increases along the corridor axis. 
Restarting the position updating loop in a long corridor can prove to be difficult. 
A feedback from this sm function can be a report of bad localization which warns 
that the inaccuracy of the robot position has exceeded an allowed threshold. The 
robot stops, turns on the spot and re-activates the re-localization mode. This 
can be repeated in order to find a non-ambiguous corner in the environment to 
restart the localization loop. 

Localization on Visual Landmarks. This function relies on a calibrated monoc- 
ular vision to detect known landmarks such as doors or wall posters. It derives 

5 Some environment modeling techniques that enable to automatically acquire such a 
topological graph with the cells and their labels exist. They are discussed in Section 
4. However, in the work referred to here, the topological graph is hand-programmed. 
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from the perceptual data a very accurate estimation of the robot position. The 
setting up is simple: a few wall posters and characteristic planar features on 
walls are learned in supervised mode. However, landmarks are available and vis- 
ible only in a few areas of the environment. Hence this sm function is mainly 
used to update from time to time the last known robot position. A feedback 
from this sm function is a report of a potentially visible landmark which indi- 
cates that the robot enters an area of visibility of a landmark. The robot stops, 
turns towards the expected landmark; it searches it using the pan-tilt mount. 
A failure report notifies that the landmark was not identified. Eventually, the 
robot retries from a second predefined position in the landmark visibility area. 

Absolute Localization. The environment may have areas equipped with cali- 
brated fixed devices, such as infrared reflectors, cameras, or even areas where 
a differential GPS signal is available. These devices permit a very accurate and 
robust localization. But the sm function works only when the robot is within a 
covered area. 

Elastic Band for Plan Execution. This sm function updates and maintains dy- 
namically a flexible trajectory as an elastic band or a sequence of configurations 
from the current robot position to the goal. Connexity between configurations 
relies on a set of internal forces that are used to optimize the global shape of 
the path. External forces are associated with obstacles and are applied to all 
configurations in the band in order to dynamically modify the path to take it 
away from obstacles. This sm function takes into account the planed path, the 
map and the on-line input from the laser data. It gives a robust method for 
long range navigation. However, the band deformation is a local optimization 
between internal and external forces; the techniques may fail into local minima. 
This is the case when a mobile obstacle blocks the band against another obsta- 
cle. Furthermore, it is a costly process which may limit the reactivity in certain 
cluttered, dynamic environments. This also limits the band length. 

The feedback may warn that the band execution is blocked by a temporary 
obstacle that cannot be avoided (e.g. a closed door, an obstacle in a corridor). 
This obstacle is perceived by the laser and is not represented in the map. If 
the band relies on a planed path, the new obstacle is added to the map. A new 
trajectory taking into account the unexpected obstacle is computed, and a new 
elastic band is executed. Another report may warn that the actual band is no 
longer adapted to the planed path. In this case, a new band has to be created. 

Reactive Obstacle Avoidance. This sm function provides a reactive motion ca- 
pability towards a goal without needing a planned path. It extracts from sensory 
data a description of free regions. It selects the closest region to the goal, taking 
into account the distance to the obstacles, it computes and tries to achieve a 
motion command to that region. 

This sm function offers a reactive motion capability that remains efficient in 
a cluttered space. However, like all the reactive methods, it may fall into local 
minima. It is not appropriate for long range navigation. Its feedback is a failure 
report generated when the reactive execution is blocked. 
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Finally, let us mention that a path planner (as described in Section 2) may 
also be seen as a sm function from the viewpoint of a high-level navigation 
controller. Note that a planned path doesn’t take into account environment 
changes and new obstacles. Furthermore, a path planner may not succeed in 
finding a path. This may happen when the initial or goal configurations are 
too close to obstacles: because of the inaccuracy of the robot position, these 
configuration are detected as being outside of CSf ree . The robot has to move 
away from the obstacles by using a reactive motion sm function before a new 
path is queried. 



3.2 Modalities 

A navigation task such as (Goto x y 9) given by a mission planning step re- 
quires an integrated use of several sm functions among those presented earlier. 
Each consistent combination of these sm functions is a particular plan called a 
modality. A navigation modality is a one way of performing the navigation task. 
A modality has specific characteristics that make it more appropriate for some 
contexts or environments, and less for others. We will discuss later how the con- 
troller choses the appropriate modality. Let us exemplify some of such modalities 
for the navigation task before giving the detail of the HTN representation for 
modalities and the associated control system. 

Modality Mi uses 3 sm functions: the path planner, the elastic band for 
the dynamic motion execution, and the laser-based localization. When M\ is 
chosen to carry out a navigation, the laser-based localization is initialized. The 
robot position is maintained dynamically. A path is computed to reach the goal 
position. The path is carried out by the elastic band sm function. Stopping the 
modality interrupts the band execution and the localization loop; it restores the 
initial state of the map if temporary obstacles have been added to it. Suspending 
the modality stops the band execution. The path, the band, the localization 
loop are maintained. A suspended modality can be resumed by restarting the 
execution of the current elastic band. 

Modality M 2 uses 3 sm functions: the path planner, the reactive obstacle 
avoidance and the laser-based localization. The path planner provides way-points 
(vertices of the trajectory) to the reactive motion function. Despite these way- 
points the reactive motion can be trapped into local minima in cluttered envi- 
ronments. Its avoidance capability is higher than that of the elastic band sm 
function. However, the reactivity to obstacles and the attraction to way-points 
may lead to oscillations and to a discontinuous motion that confuses the local- 
ization sm function. This is a clear drawback for M 2 in long corridors. 

Modality M 3 is like M 2 but without path planning and with a reduced speed 
in obstacle avoidance. It starts with the reactive motion and the laser-based 
localization loop. It offers an efficient alternative in narrow environments like 
offices, and in cluttered spaces where path planning may fail. It can be preferred 
to the modality M\ in order to avoid unreliable re-planning steps if the elastic 
band is blocked by a cluttered environment. Navigation is only reactive, hence 
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with a local minima problem. The weakness of the laser localization in long 
corridors is also a drawback for M3. 

Modality M4 uses the reactive obstacle avoidance sm function with the 
odometer and the visual landmark localization sm functions. The odometer in- 
accuracy can be locally reset by the visual localization sm function when the 
robot goes by a known landmark. Reactive navigation between landmarks al- 
lows to cross a corridor without an accurate knowledge of the robot position. 
Typically this M2 modality can be used in long corridors. The growing inaccu- 
racy can make it difficult to find out the next landmark. The search method 
allows for some inaccuracy on the robot position by moving the cameras but 
this inaccuracy cannot exceed one meter. For this reason landmarks should not 
to be too far apart with respect to the required updating of odometry estimate. 
Furthermore, the reactive navigation of M2 may fall into a local minima. 

Modality M5 relies on the reactive obstacle avoidance sm function and the 
absolute localization sm function when the robot is within an area equipped 
with absolute localization devices. 

Modalities are represented as Hierarchical Task Networks. The HTN formal- 
ism is adapted to modalities because of its expressiveness and its flexible control 
structure. HTNs offer a middle ground between programming and automated 
planning, allowing the designer to express the control knowledge which is avail- 
able here. 

An internal node of the HTN And/Or tree is a task or a subtask that can be 
pursued in different context-dependent ways, which are the Or- connectors. Each 
such Or-connector is a possible decomposition of the task into a conjunction 
of subtasks. There are two types of AND-connectors: with sequential or with 
parallel branches. Branches linked by a sequential AND-connector are traversed 
sequentially in the usual depth- first manner. Branches linked by a parallel AND- 
connector are traversed in parallel. The leaves of the tree are primitive actions, 
each corresponding to a unique query to a sm function. Thus, a root task is 
dynamically decomposed, according to the context, into a set of primitive actions 
organized as concurrent or sequential subsets. Execution starts as soon as the 
decomposition process reaches a leaf, even if the entire decomposition process of 
the tree is not complete. 

A primitive action can be blocking or non-blocking. In blocking mode, the 
control flow waits until the end of this action is reported before starting the 
next action in the sequence flow. In non-blocking mode, actions in a sequence 
are triggered sequentially without waiting for a feedback. A blocking primitive 
action is considered ended after a report has been issued by the sm function and 
after that report has been processed by the control system. The report from a 
non-blocking primitive action may occur and be processed after an unpredictable 
delay. 

The modality tree illustrated in Figure 4 starts with 6 Or-connectors la- 
beled start, stop, suspend, resume, succeed and fail. The start con- 
nector represents the nominal modality execution; the stop connector the way 
to stop the modality and to restore the neutral state, characterized by the lack 
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Fig. 4. Part of modality Mi . 



of any sm function execution. Furthermore, the environment model modified 
by the modality execution recovers its previous form. The suspend and resume 
connectors are triggered by the control system described below. The suspend 
connector allows to stop the execution by freezing the state of the active sm 
functions. The resume connector restarts the modality execution from such a 
frozen state. The fail (resp. succeed) connector is followed when the modality 
execution reaches a failure (resp. a success) end. These connectors are used to 
restore the neutral state and to allow certain executions required in these specific 
cases. 

The feedback from sm functions to modalities has to be controlled as well as 
the resource sharing of parallel activities. The control system catches and reacts 
appropriately to reports emitted by sm functions. Reports from sm functions 
play the same role in the control system as tasks in modalities. A report of some 
type activates its own dedicated control HTN in a reactive way. A control tree 
represents a temporary modality and cannot be interrupted. A nominal report 
signal a normal execution. Otherwise a non-nominal report signals a particular 
type of sm function execution. The aim of the corresponding control tree is 
to recover to a nominal modality execution. Some non-nominal reports can be 
non recoverable failures. In these cases, the corresponding control sends a “fail” 
message to the modality pursuing this sm function. Nominal reports may notify 
the success of the global task. In this case, the “success” alternative of the 
modality is activated. 

Resources to be managed are either physical non-slrarable resources (e.g. mo- 
tors, cameras, pan-and-tilt mount) or logical resources (the environment model 
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that can be temporally modified). The execution of a set of concurrent non- 
blocking actions can imply the simultaneous execution of different sm functions. 
Because of that, several reports may appear at the same time, and induce the 
simultaneous activation of several control activities. These concurrent executions 
may generate a resource conflict. To manage this conflict, a resource manager 
organizes the resource sharing with semaphores and priorities. 

When a non-nominal report is issued, a control HTN starts its execution. It 
requests the resource it needs. If this resource is already in use by a start connec- 
tor of a modality, the manager sends to this modality a suspend message, and 
leaves a resume message for the modality in the spooler according to its priority. 
The suspend alternative is executed freeing the resource, enabling the control 
HTN to be executed. If the control execution succeeds, waiting messages are 
removed and executed until the spooler becomes empty. If the control execution 
fails, the resume message is removed from the spooler and the fail alternative is 
executed for the modality. 



3.3 The Controller 

The Control Space. The controller has to choose a modality that is most appro- 
priate to the current state for pursuing the task. In order to do this, a set of 
control variables have to reflect control information for the sm functions. The 
choice of these control variables is an important design issue. For example, in 
the navigation task, the control variables: 

• The cluttering of the environment which is defined to be a weighted sum of the 
distances to nearest obstacles perceived by the laser, with a dominant weight 
along the robot motion axis. This is an important piece of information to 
establish the execution conditions of the motion and localization sm functions. 



Navigation task 




Fig. 5. The robel control system. 
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• The angular variation of the profile of the laser range data which characterizes 
the robot area. Close to a wall, the cluttering value is high but the angular 
variation remains low. But in an open area the cluttering is low while the 
angular variation may be high. 

• The inaccuracy of the position estimate, as computed from the co-variance 
matrix maintained by each localization sm function. 

• The confidence in the position estimate. The inaccuracy is not sufficient to 
qualify the localization. Each localization sm function supplies a confidence 
estimate about the last processed position. 

• The navigation color of current area. When the robot position estimate falls 
within some labeled cell of the topological graph, the corresponding labels 
are taken into account, e.g., Corridor, Corridor with landmarks, Large door, 
Narrow door. Confined area. Open area, Area with fixed localization. 

• The current modality. This information is essential to assess the control state 
and possible transitions between modalities. 

A control state is characterized by the values of these control variables. Con- 
tinuous variables are discretized over a few significant intervals. In addition, 
there is a global failure state that is reached whenever the control of a modality 
reports a failure. We finally end-up with a discrete control space which enables 
to define a control automaton. 



The Control Automaton. The control automaton is nondeterministic: unpre- 
dictable external events may modify the environment, e.g. someone passing by 
may change the value of the cluttering variable, or the localization inaccuracy 
variable. Therefore the execution of the same modality in a given state may lead 
to different adjacent states. This nondeterministic control automaton is defined 
as the tuple £ = {S, A , P, C}\ 

• S' is a finite set of control states, 

• A is a finite set of modalities, 

• P : S x Ax S — * [0, 1] is a probability distribution on the state-transition sm 
function, P a (s'|s) is the probability that the execution of modality a in state 
s leads to state s', 

• C:AxSxS—> 3? + is a positive cost function, c(a,s,s ') corresponds to 
the average cost of performing the state transition from s to s' with to the 
modality a. 

A and S are given by design from the definition of the set of modalities and 
of the control variables. In the navigation system illustrated here, there are 5 
modalities and about a few thousand states. P and C are obtained from observed 
statistics during a learning phase. 

The Control automaton A is a Markov Decision Process. As an MDP, £ 
could be used reactively on the basis of a universal policy n which selects for 
a given state s the best modality 7r(s) to be executed. However, a universal 
policy will not take into account the current navigation goal. A more precise 
approach takes into account explicitly the navigation goal, transposed into £ as 
a set S g of goal states in the control space. This set S g is given by a look-ahead 
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mechanism based on a search for a path in £ that reflects a topological route to 
the navigation goal (see Figure 5). 

Goal States in the Control Space. Given a navigation task, a search in the topo- 
logical graph provides an optimal route r to the goal, taking into account esti- 
mated cost of edges between topological cells. This route will help finding in the 
control automaton possible goal control states for planning a policy. The route 
r is characterized by the pair ( ay , l r ), where ay = (cic 2 . . • Ck) is the sequence of 
colors of traversed cells, and l r is the length of r. 

Now, a path between two states in £ defines also a sequence of colors a pa th, 
those of traversed states; it has a total cost, that is the sum J2 path C(a, s, s') 
over all traversed arcs. A path in £ from the current control state so to a state 
s corresponds to the planned route when the path matches the features of the 
route ( a r ,l r ) in the following way: 

• J2 P ath c ( a > s > s ') — Kl r , K being a constant ratio between the cost of a state- 
transition in the control automaton to corresponding route length, 

• Cpath corresponds to the same sequence of colors as ay with possible repe- 
tition factors, i.e., there are factors i\ > 0 ,...,*& > 0 such that <J pa th = 
(cf , 4 2 , . . . , 4 fc ) when oy = (ci, c 2 , . . . , c fc ). 

This last condition requires that we will be traversing in £ control states 
having the same color as the planned route. A repetition factor corresponds to the 
number of control states, at least one, required for traversing a topological cell. 
The first condition enables to prune paths in £ that meet the condition on the 
sequence of colors but cannot correspond to the planned route. However, paths 
in £ that contain a loop (i.e. involving a repeated control sequence) necessarily 
meet the first condition. 

Let route(so, s) be true whenever the optimal path in £ from so to s meets the 
two previous conditions, and let S g = {s € S | route(so, s)}. A Moore-Dijkstra 
algorithm starting from sq gives optimal paths to all states in £ in 0(n 2 ). For 
every such a path, the predicate route(so, s) is checked in a straightforward way, 
which gives S g . 

It is important to notice that this set S g of control states is a heuristic 
projection of the planned route to the goal. There is no guaranty that following 
blindly (i.e., in an open-loop control) a path in £ that meets route(so,s) will 
lead to the goal; and there is no guarantee that every successful navigation to 
the goal corresponds to a sequence of control states that meets route(so, s). This 
only an efficient and reliable way of focusing the MDP cost function with respect 
to the navigation goal and to the planned route. 

Finding a Control Policy. At this point we have to find the best modality to 
apply to the current state sq in order to reach a state in S g , given the probability 
distribution function P and the cost function C. 

A simple adaptation of the Value Iteration algorithm solves this problem. 
Here we only need to know 7r(so). Hence the algorithm can be focused on a 
subset of states, basically those explored by the Moore-Dijkstra algorithm. 

The closed- loop controller uses this policy as follows: 
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• the computed modality 7r(so) is executed; 

• the robot observes the new control state s, it updates its route r and its set 
S g of goal states with respect to s, it finds the new modality to apply to s. 

This is repeated until the control reports a success or a failure. Recovery from a 
failure state consists in trying from the parent state an untried modality. If none 
is available, a global failure of the task is reported. 

Estimating the Parameters of the Control Automaton. A sequence of randomly 
generated navigation goals can be given to the robot. During its motion, new 
control states are met and new transitions are recorded or updated. Each time 
a transition from s to s' with modality a is performed, the traversed distance 
and speed are recorded, and the average speed v of this transition is updated. 
The cost of the transition C(a, s, s') can be defined as a weighted average of the 
traversal time for this transition taking into account the eventual control steps 
required during the execution of the modality a in s together with the outcome 
of that control. The statistics on a(s) are recorded to update the probability 
distribution function. 

Several strategies can be defined to learn P and C in E. For example: 

• A modality is chosen randomly for a given task; this modality is pursued until 
either it succeeds or a fatal failure is notified. In this case, a new modality 
is chosen randomly and is executed according to the same principle. This 
strategy is used initially to expand E. 

• E is used according to the normal control except in a state on which not 
enough data has been recorded; a modality is randomly applied to this state 
in order to augment known statistics, e.g, the random choice of an untried 
modality in that state. 

3.4 Analysis of the Approach 

The system described here has been deployed on the Diligent robot, an indoor 
mobile platform, and extensively experimented with in navigation tasks within 
a wide laboratory environment [38,39]. The approach is fairly generic and illus- 
trates the use of planning techniques in robotics, not for the synthesis of mission 
plans but for achieving a robust execution of their high-level steps. It is not 
limited to navigation; it can be deployed on other robot activities. 

The HTN planning technique used for specifying detailed alternative plans 
to be followed by a controller for decomposing a complex task into primitive 
actions is fairly general and powerful. It can be widely applied in robotics since 
it enables to take into account closed-loop feedback from sensors and primitive 
actions. It extends significantly and can rely on the capabilities of the rule-based 
or procedure-based languages for programming reactive controllers, as in the 
system described here. 

The MDP planning technique relies on an abstract dedicated space, namely 
the space of control states for the navigation task. The size of such a space 
is just a few thousand states. Consequently, the estimation of the parameter 
distributions in E is feasible in a reasonable time: the MDP algorithms can be 
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used efficiently on-line, at each control step. The drawback of these advantages is 
the ad hoc definition of the control space which requires a very good knowledge 
of the sensory-motor functions and the navigation task. While in principle the 
system described here can be extended by the addition of new modalities for the 
same task, or for other tasks, it is not clear how easy it would be to update the 
control space or to define new spaces for other tasks. 

4 Discussion 

Robot motion planning is a very advanced research held [35, 28]. The early tech- 
niques in the eighties have been mostly dedicated to deterministic algorithms 
[36]. They led to a good understanding and formalization of the problem, as 
well as to several developments on related topics such as manipulation planning 
[2] . More recent approaches have built on this state of the art with probabilistic 
algorithms that permitted a significant scale up [4]. The probabilistic roadmap 
techniques introduced in [30] gave rise to several successful developments [9, 29, 
20,24,10,32,46] which represent today the most efficient approaches to path 
planning. Roadmap techniques are certainly not limited to navigation tasks; 
they have been deployed in other application areas, within robotics, e.g., for 
manipulation, or in CAD and graphics animation. The illustrations and perfor- 
mance figures in Section 2 are borrowed from Move3D, a state of the art system 
implementing roadmap techniques [45] . 

Sensory-motor functions are at the main core of robotics. They correspond 
to a very wide research area, ranging from signal processing, computer vision 
and learning, to biomechanics and neuroscience. Approaches relevant to the sm 
functions presented here are, for example, 

• the techniques used for localization and mapping, e.g., the SLAM methods 
[40,49,14,50], 

• the methods for structuring the environment model into a topological map 
with areas labeled by different navigation colors [33,48], 

• the visual localization techniques, e.g., [22], and 

• the flexible control techniques, e.g., [42,44]. 

Several high-level reactive controllers are widely deployed in laboratory 
robots. They permit a preprogrammed goal-directed and event-reactive closed- 
loop control, integrating acting and sensing. They rely on rule-based or 
procedure-based systems, such as PRS, RAP, SRC and others [16,25,11,15]. 
More recent developments on these systems, e.g., [13], aim at a closer integra- 
tion to planning. The behavior-based controllers, e.g., [3], that usually focus on 
a more reactive set of concurrent activities, have also led to more goal-directed 
developments, e.g., [23] .The robot architecture, that is the organization that en- 
ables to properly integrate the sensory-motoric functions, the reactive control 
system and the deliberative capabilities [1,47] remains important issue. 

The planning and robotics literature reports on several plan-based robot 
controllers with objectives similar to those discussed here, such as for example 
[5,7,6,31,8]. The approach of Beetz [6] has also been deployed for controlling 
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an indoor robot carrying out the cores of an office courier. It relies on the SRCs 
reactive controllers. These are concurrent control routines that adapt to changing 
conditions by reasoning on and modifying plans. They rely on the XFR.M system 
that manipulates reactive plans and is able to acquire them through learning with 
XFRMLEARN [7]. 

In addition to plan-based controllers, there is an active area of research that 
aims at interleaving task planning activities together with execution control and 
monitoring activities. Several approaches have been developed and applied, for 
example, to space and military applications, e.g., within the SIPE [41] or the 
CASPER [12] systems. Applications in robotics are for example the ROGUE 
system [21], and more recently the IxTeT-eXeC system [37] that integrates a 
sophisticated time and resource handling mechanism for planning and controlling 
the mission of an exploration robot. 
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Extended Abstract 

Recent progress in mobile broadband communication and semantic web technology is 
enabling innovative internet services that provide advanced personalization and local- 
ization features. The goal of the SmartWeb project (duration: 2004-2007) is to lay the 
foundations for multimodal user interfaces to distributed and composable semantic 
Web services on mobile devices. The SmartWeb consortium brings together experts 
from various research communities: mobile services, intelligent user interfaces, lan- 
guage and speech technology, information extraction, and semantic Web technologies 
(see www.smartweb-project.org). 

SmartWeb is based on two parallel efforts that have the potential of forming the 
basis for the next generation of the Web. The first effort is the semantic Web [1] 
which provides the tools for the explicit markup of the content of Web pages; the 
second effort is the development of semantic Web sendees which results in a Web 
where programs act as autonomous agents to become the producers and consumers of 
information and enable automation of transactions. 

The appeal of being able to ask a question to a mobile internet terminal and receive 
an answer immediately has been renewed by the broad availability of information on 
the Web. Ideally, a spoken dialogue system that uses the Web as its knowledge base 
would be able to answer a broad range of questions. Practically, the size and dy- 
namic nature of the Web and the fact that the content of most web pages is encoded in 
natural language makes this an extremely difficult task. However, SmartWeb exploits 
the machine-understandable content of semantic Web pages for intelligent question- 
answering as a next step beyond today’s search engines. Since semantically annotated 
Web pages are still very rare due to the time-consuming and costly manual markup, 
SmartWeb is using advanced language technology and information extraction meth- 
ods for the automatic annotation of traditional web pages encoded in HTML or XML. 

But SmartWeb does not only deal with information-seeking dialogues but also 
with task-oriented dialogues, in which the user wants to perform a transaction via a 
Web service (e.g. buy a ticket for a sports event or program his navigation system to 
find a souvenir shop). 

SmartWeb is the follow-up project to SmartKom (www.smartkom.org), carried out 
from 1999 to 2003. SmartKom is a multimodal dialog system that combines speech, 
gesture, and facial expressions for input and output [2], Spontaneous speech under- 
standing is combined with the video-based recognition of natural gestures and facial 
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expressions. One version of SmartKom serves as a mobile travel companion that 
helps with navigation and point-of-interest in-formation retrieval in location-based 
services (using a PDA as a mobile client). The SmartKom architecture [3] supports 
not only simple multimodal command-and-control interfaces, but also coherent and 
cooperative dialogues with mixed initiative and a synergistic use of multiple modali- 
ties. Although SmartKom works in multiple domains (e.g. TV program guide, tourist 
information), it supports only restricted-domain question answering. SmartWeb goes 
beyond SmartKom in supporting open-domain question answering using the entire 
Web as its knowledge base. 

SmartWeb provides a context-aware user interface, so that it can support the user 
in different roles, e.g. as a car driver, a motor biker, a pedestrian or a sports spectator. 
One of the planned demonstrators of SmartWeb is a personal guide for the 2006 
FIFA world cup in Germany, that provides mobile infotainment services to soccer 
fans, anywhere and anytime. Another SmartWeb demonstrator is based on P2P com- 
munication between a car and a motor bike. When the car’s sensors detect aqua- 
planing, a succeeding motor biker is warned by SmartWeb “Aqua-planing danger in 
200 meters!”. The biker can interact with SmartWeb through speech and haptic feed- 
back; the car driver can input speech and gestures. 

SmartWeb is based on two new W3C standards for the semantic Web, the Re- 
source Description Framework (RDF/S) and the Web Ontology Language (OWL) for 
representing machine interpretable content on the Web. OWL-S ontologies support 
semantic service descriptions, focusing primarily on the formal specification of in- 
puts, outputs, preconditions, and effects of Web services. In SmartWeb, multimodal 
user requests will not only lead to automatic Web service discovery and invocation, 
but also to the automatic composition, interoperation and execution monitoring of 
Web services. 

The academic partners of SmartWeb are the research institutes DFKI (consortium 
leader), FhG FIRST, and ICSI together with university groups from Erlangen, Karls- 
ruhe, Munich, Saarbriicken, and Stuttgart. The industrial partners of SmartWeb are 
BMW, DaimlerChrysler, Deutsche Telekom, and Siemens as large companies, as well 
as EML, Ontoprise, and Sympalog as small businesses. The German Federal Ministry 
of Education and Research (BMBF) is funding the SmartWeb consortium with grants 
totaling 13.7 million euros. 
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Abstract. Although Reinforcement Learning methods have meanwhile 
been successfully applied to a wide range of different application scenar- 
ios, there is still a lack of methods that would allow the direct appli- 
cation of reinforcement learning to real systems. The key capability of 
such learning systems is the efficency with respect to the number of inter- 
actions with the real system. Several examples are given that illustrate 
recent progress made in that direction. 



1 Introduction 

In recent years, many successful (real world) applications of machine learning 
methods have been developped, e.g. in classification, diagnosis or forecasting 
tasks. However, for a broad acceptance of learning methods as a standard soft- 
ware tool, still many theoretical and practical problems have to be solved. This 
is especially true for Reinforcement Learning scenarios, where the only training 
signal is given in terms of success or failure. Although this paradigm is in prin- 
ciple very powerful due to the minimal requirements on training information, 
today’s real world applications often fail due to the large amount on training 
experiences, until a task is successfully learned. Our research effort lies in nar- 
rowing this gap by developing methods for data-efficient and robust machine 
learning. Autonomous robots - some of them with the ability to play soccer - 
are one of our favorite testbeds. Examples of efficient machine learning methods 
and their integration into large software systems are given by several (real world) 
tasks. 

2 Basic Idea of Reinforcement Learning 

The framework we consider in Reinforcment Learning (RL) is a standard Markov 
Decision Process (MDP) [10]. An MDP is a 4-tuple, (S, A,T,r), where S de- 
notes a finite set of states, A denotes the action space, T is a probabilistic 
transition function T : S x A x S — > [0,1], that denotes the probability for 
a transition from state s to state s' when a certain action a is applied. Fi- 
nally, r:5xd-»Sisa reward function, that denotes the immediate re- 
ward for applying a certain action a in a certain state s. We are looking for 
an optimal policy 7r*(s) = min w J n (s) that minimizes the expected path costs 
J 7r (s) = '}2 t r(st,ir(st)), so = s. The value iteration procedure is given by the 
following recursive equation: Jfc(s) = min ae ^ E{r(s, a) + Jk{s')}, where s' is the 
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successor state that is reached with probabilty T(s, a, s'), and E denotes the ex- 
pectation. Value iteration can be shown to reach the optimal path costs J*(s) in 
the limit under certain assumptions. The optimal policy is then given by greedily 
exploiting the optimal value function: 7r*(s) € argmin a g ,4 E{r(s, a) + J*(s')}. 

In case of a finite state space, incrementally updating the value function for 
every state finally approaches the optimal value function and therefore yields 
the optimal policy. 



3 Enhancing the Basic Framework 

When it comes to real world applications, the assumption of a finite state space is 
often not met. Especially in control applications, state variables are continuous, 
and often state spaces are of dimension 4, 5 or higher. Function approximators 
such as CM AC, feedforward neural networks or grid-based approaches have to 
be applied in case of continuous state variables in order to represent the value 
function. Also, instead of an idividual update of each state (which is not feasi- 
ble in case of an infinte state space), the value function is updated at certain 
points only, e.g. along concrete trajectories [9]. Due to the different behaviour 
of function approximators compared to a tabular based value function, also the 
reinforcement learning process itself has to be adapted: whereas in case of a table 
the order of the updates is uncritical as long as some fairness conditions are met, 
the careful choice of training experiences plays an important role when function 
approximators are used, since every update in parameters can change the value 
function at many other places. 

3.1 Example Applications 

One of the first big successes in RL was Tesauro’s TD-gammon system that used 
a neural network to represent the value function. By pure self-play it was finally 
able to play Backgammon at a grandmaster level. Further examples reach from 
elevator control [7], scheduling [8], to helicopter control. 

Our research group has a special focus on learning of closed loop control 
systems for technical applications. Among the successful applications are the 
control of a chemical plant (2 state variables, strongly nonlinear), control of a 
single and double inverted pendulum (4 and 6 state variables, strongly instable) 
[2], control of a combustion engine (industrial project, 4 state variables, 2 control 
inputs) [1], thermostat control (industrial project, strongly varying time scales) 
[5], 

In our Brainstormers project, we follow the approach of integrating learning 
and other AI techniques in a large software system, using the strengths of the in- 
dividual methods and plugging them together. The Brainstormers are a team of 
virtual soccer playing robots, that compete in the RoboCup international compe- 
titions. Our goal is to show, that learning techniques can be successfully applied 
in very complex environments and are able to compete with other approaches. 
Year by year, we increase the amount of learned behaviour, starting with basic 




54 



Martin Riedmiller 



skills (1999 and 2000), via positioning (2001) to multi-agent attack behaviour 
(2002-2004). In the last five world championships, we won three second prices 
and two third prices [3] . 

4 Learning in Real Systems 

Despite the considerable successes of learning in complex and non-trivial tasks, 
the direct application of RL to real world systems is still an open problem. While 
this can be explained to some extent by the really sparse training information 
(success or failure), which intrinsically requires exploration and experimentation, 
the typical amount of some 10,000 trajectories or even more to learn a desired 
behaviour is not acceptable for many real world systems. 

One common way to go is to build a simulation of the real system first and the 
to learn in simulation. If the simulation is accurate enough, the learned controller 
can be directly applied to the real system. This often is sufficient, since the closed 
loop nature of the overall system can deal with smaller inaccuracies between 
simulated and real plant. An example of this is the learning of a positioning 
with obstacles for our omnidirectional soccer playing robot [6]. However, the 
method of simulation requires the additional effort of modelling the real plant 
first, and this is often either costly and sometimes even infeasible. 

Therefore, an urgent need for efficient learning methods exists. We under- 
stand ’efficiency’ here in the sense that only a low number of interactions with 
the real system occurs during the learning phase. Efficiency does not necessarily 
mean, that the overall learning process is fast, e.g. there might well be time- 
consuming offline phases to adapt internal parameters. 

One promising direction to go is to consider the problem on a coarser level. 
More precisely, the idea is to construct so-called abstract states, that themselves 
contain a huge number of states of the original problem. Therefore, fixing the 
right policy for such an abstract state immediately defines the policy for a huge 
number of original states. This could vastly improve learning speed. However, 
there are two problems with this idea: One is to find the correct state space 
abstraction. At the moment, we rely on human intuition to solve that part of 
the problem, although we expect to come up with algorithmic solutions for this 
task. The other problem is, that the original value iteration method can be shown 
to fail in general, even if a policy for the abstract state space exists in principle. 
We recently came up with several algorithms that are based on both search and 
ideas from Dynamic Programming and that can be guaranteed to find solutions 
under certain assumptions [4], First results are promising, allowing for example 
to learn a policy to swing up a pole in a few hundred trials instead of some 
tenthosand trials. 

5 Future Work 

Our current research concentrates on the development of better, i.e. more effi- 
cient methods for RL, but also on the embedding of such learning methods in 
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larger software systems. We also started an effort to establish a benchmark suite, 
that allows to compare various algorithms and methods with respect to figures 
relevant for learning real system’s control, like e.g. the number of interactions 
with the real system or the quality of the learned control policy. 
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Abstract. This paper proposes a new corpus-based approach for deriving syn- 
tactic structures and generating parse trees of natural language sentences. The 
parts of speech (word categories) of words in the sentences play the key role for 
this purpose. The grammar formalism used is more general than most of the 
grammar induction methods proposed in the literature. The approach was tested 
for Turkish language using a corpus of more than 5,000 sentences and success- 
ful results were obtained. 



1 Introduction 

In this paper, we propose a corpus-based approach for deriving the syntactic struc- 
tures of sentences in a natural language and forming parse trees of these sentences. 
The method is based on a concept which we name as proximity. The parts of speech 
(word categories) of words in the sentences play the key role in determining the syn- 
tactic relationships within sentences. The data about the order and frequency of word 
categories are collected from a corpus and are converted to proximity measures for 
word categories and sentences. Then these data are used to obtain probable parse trees 
for a given sentence. 

It is well-known that grammars of natural languages are highly complicated and 
powerful. There have been several efforts for obtaining suitable grammars for particu- 
lar languages that can generate most (if not all) of the sentences in those languages. 
The grammars defined manually for this purpose have limited success. The difficulty 
lies in the resistance of natural languages against syntactic formalizations. It is not 
known exactly what are the syntactic hierarchies inherent in the sentences. In fact, it 
is very easy to define a grammar that can generate all sentences in a language, but 
such a “general’' grammar also generates non-sentences. Thus, forming grammars that 
can include sentences and at the same time exclude non-sentences is the difficult part 
of this task. 

In order to overcome the difficulties posed by such rule-based approaches in proc- 
essing natural languages, corpus-based approaches (collected under the name “statis- 
tical natural language processing”) have begun to emerge recently [1,2]. They em- 
body the assumption that human language comprehension and production works with 
representations of concrete past language experiences, rather than with abstract 
grammatical rules. There are several studies on statistical natural language processing. 
A nice approach is data oriented parsing model [3,4,5]. This model necessitates anno- 
tated corpora in which parse trees of sentences are explicit. The idea is building new 
sentences by composing fragments of corpus sentences. 
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An interesting field where corpus-based approaches are used is grammar induction 
(learning). This usually means in the literature learning probabilistic context-free 
grammars (PCFGs). As stated in [1], the simplest method (and the basic idea) is gen- 
erating all possible rules, assigning them some initial probabilities, running a training 
algorithm on a corpus to improve the probability estimates, and identifying the rules 
with high probabilities as the grammar of the language. However, this method is un- 
realistic as there is no bound on the number of possible rules, and even with con- 
straints on the number of rules, often the number is so large that it becomes impracti- 
cal in terms of computation time. The solution usually applied is restricting the rule 
types. In [6,7, 8, 9], some methods which use dependency grammars or Chomsky- 
normal-form grammars are presented. 



2 Outline of the Method 

Given a sentence, first the categories of the words in the sentence are determined. The 
sentence is at first considered as a single unit, which is formed of the sequence of 
these categories. It is then analyzed how this sequence can be divided into subse- 
quences. For nontrivial sentences, the number of possible subsequences is quite large, 
and in general it grows exponentially with the length of the sentence. Among the 
possible subsequences, the best one is found according to the data in the corpus. The 
result is a set of smaller sentences. For each sentence in this set, the same process is 
repeated until the original sentence is partitioned into single categories. 
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Fig. 1 . Partitions for the sentence “adam siyah §apkayi begendi” 

The process is illustrated for the following simple Turkish sentence in Figure 1: 

adam siyah §apkayi begendi 
(n) (a) (n) (v) 

man black hat+acc like+pst+3sg 
(the man liked the black hat) 

(The word categories that appear in this work are: a for adjective, d for adverb, n 
for noun, and v for verb.) We represent the sentence as [nanv] in the form of a single 
sequence of word categories. Suppose that, after all alternative subsequences are 
evaluated, dividing into two groups as [nan] and v yields the best result, as shown in 
Figure l.a. These two subsequences are considered as new (smaller) sentences. The 
process is over for the second one since it is formed of a single category. The other 
sentence ([nan]) is analyzed and divided into subsequences n and [an] (Figure l.b). 
Finally, the only subsequence left ([an]) is partitioned into a and n, as shown in Fig- 
ure 1 .c. (In fact, an analysis need not be performed for a subsequence of length two, 
since there is only one way it can be partitioned.) The process ends up with all the 
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subsequences having a single category. By combining the phases of this process, we 
can obtain a tree structure as the result of the analysis of the sentence. This is shown 
in Figure 2. 



[nanv] 



[nan] v 



n [an] 



a n 

Fig. 2. Combination of partitions for the sentence “adam siyah §apkayi begendi” 

As can be seen, the tree formed after the analysis of a sentence is very similar to a 
parse tree of the sentence. By denoting the root node with S and the intermediate 
nodes with special symbols X ; , i>l, and extending the leaf nodes with the words in 
the sentence, it is converted to a parse tree. The symbols X ; denote syntactic constitu- 
ents like NP, VP. Since the parse tree of a sentence is built with respect to a grammar 
for the language, it is possible to extract the grammatical rules inherent in the tree. 
This grammar induction process is beyond the scope of this research. But, it can be 
solved with a simple mechanism when the parse trees are already available. Our aim 
here is limited to obtaining probable parses for sentences. 



3 The Grammar Formalism 

The grammar type underlying the parse trees in this research is a restricted context- 
free grammar: Each rule is in the form N— >a, where N is a nonterminal symbol, a is a 
string of nonterminal and terminal symbols, and the number of symbols in a is greater 
than one. The only restriction we impose on a context-free grammar comes from the 
last part of this definition; a rule can not derive a single nonterminal or terminal sym- 
bol. We call a rule N— >A, where N is a nonterminal and A is a nonterminal or termi- 
nal, a 1-1 rule and the corresponding derivation a 1-1 derivation. In our grammar, 1-1 
rules are not allowed. Note that the grammar type we employ is more general than 
those in [6,8,9], which restrict themselves to dependency and Chomsky-normal-form 
grammars. 

The number of parse trees that can be generated by this grammar is exponential in 
nature. For a sentence formed of n words (categories), there are 2 n l -l alternative 
derivations in the first level. If we continue enumerating the alternative derivations 
until each category is a leaf node, we obtain a large number of parse trees. 

A few words are in order about the restriction we impose on the grammar. The rea- 
son of this restriction is to limit the number of parse trees that can be generated to a 
finite (albeit, very large) number and also to decrease the computation time. If 1-1 
derivations are allowed when enumerating all the parse trees of a sentence, there will 
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obviously be an infinite number of trees. However, in the case that 1-1 derivations are 
not used, the number of categories in a node will always be less than that of its parent 
node, and thus the depth of the tree will be finite. 

4 Parse Tree Generation 

The method makes use of a corpus containing individual sentences. For each sen- 
tence, the categories of the words in the sentence are found first and then the number 
of each consecutive two-category, three-category, etc. combinations are stored. We 
call each such category combination a category string. In other words, for a sentence 
of n categories [CjC-,...c n ], the category strings are as follows: [CjC,], [c 2 c 3 ], ..., [c n _ 



calculation is performed for each sentence in the corpus and the numbers are totalled. 
The result gives us an indication about the frequency of consecutive use of word cate- 
gories. As can be guessed, the frequencies of short category strings are usually greater 
than those of long category strings, since short category strings already appear within 
some long ones. We will denote the frequency of a category string [C;C i+1 ...Cj], i<j, 
with Freq(c i ,c i+1 ,...,Cj). 

Definition: Given a sentence of n words [c 1 c 2 ...c i ...C:...c n ], n>l, l<i,j<n, i<j, the 
category proximity of the category string [CjC i+1 ...C:], CP(c ; ,c i+1 ,...,Cj), indicates the 
closeness of the categories u, c i+1 , . . . , c- to each other and is defined as follows: 



CP(c ; ,...,Cj) is a measure of the strength of the connection between the categories 
c ; , ..., Cj when considered as a single group. Small value of CP indicates stronger 
connection. If CP(c ; ,...,Cj) is small, it is more likely that [c. . .c] forms a syntactic 
constituent. 

Figure 3 compares a small CP value with a large CP value. For visualization, we 
represent CPs as distances between relevant nodes on a tree; that is, CP(C;,...,Cj) is the 
distance between nodes c ; and Cj. In Figure 3. a, CP(Cj,...,Cj) is a small number (rela- 
tive to Figure 3.c), which means that the categories u, .... Cj are close to each other 
(i.e. this category combination is a frequently occurring one). Thus they have a ten- 
dency to form a syntactic constituent, as shown in Figure 3.b. (Note that the branches 
in Figure 3. a do not indicate a derivation - this is emphasized by using dotted lines. 
They are used only to visualize the CPs on a figure. Also note that, since the situation 
is explained for one group of categories c ; , ..., c,, we do not take the other categories 
(Cp ..., c M , C j+1 , ..., c n ) into account. The CP values of other categories will in fact 
affect the partitioning in Figure 3.b.) On the other hand, Figure 3.c shows a case 
where CP(c ; ,. . . ,Cj) is large. In this case, we say that the category combination c ; , . .., Cj 
does not occur frequently. They do not tend to form a syntactic constituent; rather 
they tend to be partitioned as separate branches in the tree, as shown in Figure 3.d. 
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Fig. 3. Comparison of CP values 

As an example, consider the following sentence: 

birdenbire odaya girdi 

(d) (n) (v) 

suddenly room+dat enter+pst+3sg 
(he/she suddenly entered the room) 

Suppose that Freq(d,n)=100, Freq(n,v)=1000, and Freq(d,n,v)=50. That is, the ad- 
verb-noun combination is followed by a verb half of the time, and the noun-verb 
combination occurs frequently but it is rarely preceded by an adverb. Then, the cate- 
gory proximity measures are as follows: CP(d,n)=0.5, CP(n,v)=0.05. We see the 
situation in Figure 4. The figure suggests that the noun and the verb can form a syn- 
tactic constituent. 



[dnv] 




d 

<■ 



0.5 



n v 
0.05 



Fig. 4. CP values for the sentence “birdenbire odaya girdi” 



Definition: Given a sentence of n words [CjC,...c n ], n>l, the sentence proximity of 
the sentence, SP(Cj,c 2 ,...,c n ), indicates the overall closeness of the categories in the 
sentence and is defined in terms of category proximities: 

n — 1 

SP{c x ,C 2 ,-,c n )=Y J CP{c i ,c i+l ) . (2) 

/= l 



Similar to category proximity, SP(c 1 ,...,c n ) is a measure of the strength of the con- 
nection between the categories in the sentence. The difference lies in the range of 
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categories it affects. Instead of determining how probable it is for a particular group 
of categories c ; , u within the sentence to form a syntactic constituent, it increases 
or decreases these probabilities for all category combinations in the sentence. Small 
value of SP is a bias in favour of more syntactic constituents. 
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Fig. 5. Comparison of SP values 



Figure 5 compares a small SP value with a large SP value. Assume that the ra- 

CP[c i ,...,c ,) 

tio — — (and the ratios for category strings other than [c ...c]) are the same 

SP{c l ,...,c n ) 

in Figures 5. a and 5.c. In this case, the category proximity measure is not sufficient to 
differentiate the different syntactic relationships among the categories (as will be clear 
later) - it will force the same syntactic constituents to be built in both sentences. 
However, in reality, the fact that category combinations occur more frequently is a 
sign of more syntactic relationships (since they did not occur in that sentence “by 
chance”). This effect is provided by the sentence proximity measure. Figure 5.b 
shows that the categories c ; , . . ., Cj of Figure 5. a form a syntactic constituent, whereas 
Figure 5.d shows that the categories c ; , ..., a of Figure 5.c tend to be partitioned as 
separate categories. 

The two proximity concepts are used together in order to produce a parse tree for a 
sentence. Suppose that we have a sentence of n words [CjC 2 ...c ], n>l. The category 
proximity values for all category strings in the sentence (except CP(Cj,. . .,c n )) are 
calculated. These values may be in conflict with each other. For instance, CP(C[,c 2 ) 
and CP(c.,,c 3 ) may be small, forcing the corresponding categories to make a group, 
but CP(Cj,c 2 ,c 3 ) may be large, having an opposite effect. The idea is extracting the 
real (or, best) proximity figures inherent in these data. This is accomplished by taking 
the initial CP values of category strings of length two (i.e. CP(C;,c i+1 ), l<i<n) into 
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consideration, applying the effects of other CP values on these, and arriving at final 
CP values of category strings of length two. These values denote the real proximities 
for each pair of categories. 

For this purpose, the following linear programming problem is formulated and 
solved: (The equations have n-1 variables Xj, x 2 , ..., x n l whose values are sought. x ; , 

l<i<n, corresponds to CP(Cj,c i+1 ). p ; j and n ; :, l<i<n-2, l<j<n-l, i+j<n, stand for posi- 
tive and negative slack variables, respectively. The goal is obtaining actual CP(Cj,c i+1 ) 
values (i.e. x ; ’s) such that the sum of the slack variables is minimum.) 



min p, !+...■ +P 1 , n . 1 +P 2 , !+■ • ■+P 2 ,n-2+- • • + Pn-2,l+P„-2.2+ 



subject to 



x i+Pi,r n U = Cp ( c n c 2) 
X 2 + Pl!2- n l! 2 = CP(C 2 ,C 3 ) 



X „-l+Pl,n-r n l,n-l = CP ( C „-P C n) 
Xj+x 2 +p 21 -n 2il = CP(Cj,c 2 ,c 3 ) 



X n-2 +X n.l + P2.n-2- n 2,n-2 = CP ( C n-2’ C n-l’ C n ) 



X l+-'- +X n-2 + Pn-2.r n n-2.1 = CP ( C l-- C „-l) 

X 2+' ' • +X n-l+Pn-2.2- n n-2,2 = CP(C 2 „ . „C n ) 

Let CP'jCj.c^j), l<i<n, denote the actual category proximity values obtained and 

n — 1 

SP'(c [,..., c n ) ( = ^ CP'(cj , c i+] ) ) the actual sentence proximity value. The tree struc- 

i=i 

ture formed with these actual values will be called the actual tree. As mentioned in 
Section 3, the category string [c, . . .cj can be partitioned in 2 lvl -l ways. We call each 
such partition a partition tree. The task is finding the most probable partition tree. To 
this effect, the actual tree is compared with each partition tree, a score is calculated 
for each, and the one with the smallest score is chosen. 

Definition: Given an actual tree and a partition tree P of n words [C[C 2 ...c n ], n>l, the 
sentence proximity of the partition tree, SP p (Cj,c 2 ,...,c n ), is equal to the sentence prox- 
imity of the actual tree. That is, 

SP P (c 1 ,c 2 „..,c n ) = SP'(c I ,c 2 „..,c n ) . (3) 



Definition: Given a partition tree P of n words |c | c 7 ...c n ], n>l, let the m partitions, 
l<m<n, be [c u ...,c h \ [c ii+l ,...,c h [c im i+1 ,...,c im ) (l<i 1 <i 2 <...<i m =n). Then, the 
category proximity of two consecutive categories, CP p (C;,c i+1 ), l<i<n, in the tree, is 
defined as follows: 



Generation of Sentence Parse Trees Using Parts of Speech 63 



0 

CPp (cj , C i+ ] ) — \ SPp (q c n ) 



, if Cj and C ; +1 are in the same partition 



, otherwise 



(4) 



m — 1 



Intuitively, we consider the distance (proximity value) between the first and last 
branches of a partition tree as equal to the same distance in the actual tree and then 
divide this distance to the number of branches minus one to obtain an equal distance 
between each pair of branches. 

Having obtained the actual tree, it is compared with each possible partition tree in 
order to find the most similar one. In fact, the actual tree is the most realistic tree in 
terms of showing the syntactic relationships in the sentence. However, since such 
“fuzzy” derivations can not take part in sentence parse trees, we must represent it with 
a suitable partition tree. 

Definition: Given an actual tree of n words [CjCj.-.cJ, n>l, the cumulative category 
proximity of a category c ; , l<i<n, CCP'tc), is the total of the category proximity val- 
ues between the first and the c ; th categories. That is, 



The cumulative category proximity for a partition tree P, CCP p (C;), is defined 
analogously. Note that CCP'(Cj)=0 and CCP / (c n )=SP'(c 1 ,...,c n ); but these border val- 
ues will not be used in the following derivations. 

Definition: Given an actual tree and a partition tree P of n words [C[C 2 ...c n ], n>2, the 
similarity score between the two trees, SS P , is defined as follows: 



where abs is the absolute value function and cg(c ; ) is the category grouping value: 



Intuitively, the similarity score between an actual tree and a partition tree indicates 
the total of the amount of the distances traversed when “moving” the branches of the 
actual tree in order to make the actual tree identical to the partition tree. Small value 
of SS P means more similarity between the trees, as the distance traversed will be less. 

The category grouping value serves for the effect of sentence proximity mentioned 
before (Figure 5). Suppose that a category c ; is included within a partition of length 

greater than one, as in Figure 5.b, so cg(c i )=SP'(c 1 ,...,c n ). Then, an actual tree with a 

smaller SP' value (Figure 5. a) than another actual tree with a larger SP' value (Figure 
5.c) will be more similar to that partition tree, since cg(cj) is a multiplicative factor in 
equation (6). In other words, the former one will bias in favour of those partition trees 




i - 1 



(5) 



n — 1 



ss P = 



2 abs[CCP'{c , ) - CCP P fa )] * cg { Cl ) . 



( 6 ) 



i-2 



C 




,if Cj forms a partition by itself 
c„) , otherwise 



(7) 
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in which c appears within a group among all the possible partition trees, whereas the 
latter one will bias in favour of partition trees in which c forms a separate partition. 

After the most similar partition tree is chosen, each partition with length greater 
than two is considered as a new sentence and the whole process is repeated. As ex- 
plained in Section 2, the collection of all the most similar partition trees then forms 
the parse tree of the sentence. 



5 Implementation of the Method 

The proposed approach was implemented for Turkish. A corpus of general text con- 
taining about 5,700 sentences was compiled. The average length (number of words) 
of the sentences is 18.6. The corpus includes long sentences having as many as 50 
words. Word categories are derived by using the spelling checker program explained 
in [10]. The frequencies of all category strings in the corpus are collected and stored 
in a database. 

The method was applied to several sentences and parse trees were generated. Be- 
low we present the details of a short sentence only due to lack of space. The sentence 
was taken from a newspaper: 

iilkedeki demokratik gelismeler yetersizdir 

(n) (a) (n) (v) 

country+loc democratic progress+pl adequate+neg+cop 
(democratic progresses in the country are not adequate) 



Table 1 . Calculations for the example sentence (first iteration) 



Freq(n,a) = 5,992 
Freq(a,n) = 6,973 
Freq(n,v) = 6,639 
Freq(n,a,n) = 3,036 
Freq(a,n,v) = 865 
Freq(n,a,n,v) = 367 

(a) 


CP(n,a) = 0.061 
CP(a,n) = 0.053 
CP(n,v) = 0.055 
CP(n,a,n) = 0.121 
CP(a,n,v) = 0.424 

SP(n,a,n,v) = 0.169 

(b) 


min pl+nl +p2+n2+p3+n3+p4+n4+p5+n5 

subject to 

xl+pl-nl=0.061 

x2+p2-n2=0.053 

x3+p3-n3=0.055 

x 1 +x2+p4-n4=0. 121 

x2+x3+p5-n5=0.424 

(c) 


CP'(n,a) = 0.061 
CP'(a,n) = 0.060 
CP'(n,v) = 0.365 

SP'(n,a,n,v) = 0.486 
(d) 


CPp(n.a) = 0 
CPp(a,n) = 0 
CPp(n,v) = 0.486 

SPp(n,a,n,v) = 0.486 
(e) 


CCP'(a) = 0.061 
CCP'(n) = 0.121 
CCPp(a) = 0 
CCPp(n) = 0 

SS P = 0.088 

(f) 
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Table 2. Calculations for the example sentence (second iteration) 



Freq(n,a) = 5,992 
Freq(a,n) = 6,973 
Freq(n,a,n) = 3,036 

(a) 


CP(n,a) = 0.507 
CP(a,n) = 0.435 

SP(n,a,n) = 0.942 

(b) 


min pl+nl+p2+n2 
subject to 
xl+pl-nl=0.507 
x2+p2-n2=0.435 

(c) 


CP'(n,a) = 0.507 
CP'(a,n) = 0.435 

SP'(n,a,n) = 0.942 
(d) 


CPp(n,a) = 0.471 
CPp(a,n) = 0.471 

SPp(n,a,n) = 0.942 
(e) 


CCP'(a) = 0.507 
CCPp(a) = 0.471 
SSp = 0.036 

(f) 



For each iteration of the process, we give the calculations in a table. The calcula- 
tions involve the actual tree and the most probable partition tree P. Calculations for 
other partition trees are not shown due to the large number of possible trees. Each 
table consists of the following data: category string frequencies (part a), initial cate- 
gory proximities and sentence proximity (part b), linear programming problem (part 
c), actual category proximities and sentence proximity (part d), category proximities 
and sentence proximity for the partition tree P (part e), and cumulative category prox- 
imities and the similarity score between the actual tree and the partition tree P (part f). 

Table 1 contains the data and the results for the category string [nanv] and Table 2 
for the category string [nan]. The final parse tree is shown in Figure 6. 



S 




Fig. 6. Parse tree for the example sentence 

An observation about the computational complexity of the method is worth men- 
tioning here. The program was executed on a Pentium 4 1.3 Ghz machine. The execu- 
tion time is very low for sentences containing at most 15-20 words. It takes about 4-5 
seconds parsing such sentences. The computation time seems very promising when 
we consider the size of the search space - that is, we are working with a nearly unre- 
stricted context-free grammar formalism. The reason is pruning the search space at 
each iteration and taking only the best partitioning into account. 



6 Conclusions 

In this paper, we proposed a new method for generating parse trees of natural lan- 
guage sentences. Due to the limitations of rule-based techniques in formalizing natu- 
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ral languages, statistical techniques are gaining popularity. This was the direction 
pursued in this research. The method is based on the information inherent in the word 
categories (parts of speech) of words within the sentences. By using the frequency 
and order of these categories, a method was formulated to make the syntactic relation- 
ships in sentences explicit. 

The parse trees that the method produces are equivalent to those that can be gener- 
ated by a little restricted context-free grammar. The grammar formalism used is more 
general than those used by other similar approaches. 

The approach was tested for Turkish using a corpus of about 5,700 sentences. Al- 
though an exact evaluation is not possible since there does not exist a complete 
grammar for the language, the results are successful. The parse trees produced by the 
program for about half of the sentences seem correct. One strength of the method is 
its ability to generate plausible parses for complex sentences. But parses which can 
not capture the syntactic relationships inside the sentences or that result in slightly 
misplaced constituents were also produced. As the size of the corpus increases, we 
may expect better results. 

An attractive area for future research is extracting a grammar using these parse 
trees. This will be an important contribution if it becomes possible to obtain a robust 
grammar, since no comprehensive grammars have been written yet. It may also pro- 
vide feedback to rule-based grammar studies. 
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Abstract. Even though search engines cover billions of pages and per- 
form quite well, it is still difficult to find the right information from 
the returned results. In this paper we present a system that allows a 
user to re-rank the results locally by augmenting a query with positive 
example pages. Since it is not always easy to come up with many ex- 
ample pages, our system aims to work with only a couple of positive 
training examples and without any negative ones. Our approach creates 
artificial (virtual) negative examples based upon the returned pages and 
the example pages before the training commences. The list of results is 
then re-ordered according to the outcome from the machine learner. We 
have further shown that our system performs sufficiently well even if the 
example pages belong to a slightly different (but related) domain. 



1 Introduction 

Finding information has always been one of the popular tasks in the Internet. 
Whereas human-maintained web directories seemed capable of providing access 
to all interesting information in the beginning, they were not able to keep up 
with the exponential growth of web pages over the Internet. Nowadays general 
purpose search engines seem to provide acceptable results, but their greatest 
strength - their generality, i.e. their capability to retrieve information for any 
kind of search - is at the same time their greatest weakness. 

Danny Sullivan from http://searchenginewatch.com estimated that on Sep- 
tember 2nd, 2003, the search engine with the biggest index of web pages was 
Google with about 3.3 billions of textual documents indexed ([11])- This means 
that a vast amount of data is available to the user. The main problem for a 
user is to specify a query for a search engine to retrieve the documents she is 
looking for. It is often hard to specify search queries - a survey by the NEC 
Research Institute in Princeton, New Jersey, revealed that “up to 70% of web 
users typically type in only one keyword or search term” ([3]). This seems to 
be a wide-spread phenomena. They also found that almost 50% of the queries 
contained just one word and a mere 30% contained two key words, even amongst 
the queries made by their own staff. 

We try to look at the search problem from a user’s psychological perspective 
(while she is sitting in front of a terminal to conduct a search), and we will 
address two specific areas, namely, query formulation and ranking. To express 
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what one wants is not always easy. Although a query may look trivial, there 
are a lot of subtlies in one’s intention that are difficult to make explicit. In the 
research paradigm of case-based reasoning [7], people usually apply some of their 
past cases/experience to solve the current problem. Hence, it is hypothesized 
that the formulation of a search query may become easier if the user can provide 
some example pages that she believes to be similar/relevant. Another area of 
concern in a search process is the ordering of the information from a general 
purpose search engine. Most of the powerful engines nowadays can find useful 
information, but the returned documents are only ranked according to a universal 
formula that may not accommodate to individual needs. The user has to further 
identify the web pages that satisfy her current needs; this may require a user to 
flip through many pages of search results and to go through the details of each 
and every returned document. In other words, the user still has to find the needle 
albeit in another haystack, and this can be quite discouraging. The ordering of 
the results is very important. When a user performs a search in the Internet, it 
does not help if a search engine returns plenty of positive documents but none 
of them appears within the top 50 search results. Users are generally unwilling 
to flip through many negative documents to find the first positive one. 

Our aim is to create a system that re-ranks the pages locally on the client 
machine according to the implicit constraints inherent in the positive example 
pages provided by the user. Even if the list of documents returned by the general 
search engine contains a very low percentage of relevant documents, they will 
be rearranged to the top of the list. Another reason of leaving the re-ranking in 
the client machine rather than at the server side is to enable the privacy of the 
user. A user can now do more specific search without exposing herself too much 
to the search engine providers. 

The rest of this paper is organized as follows. A brief overview of related 
work is described in Sect. 2. The design of the system can be found in Sect. 3 
where we will pay specific attention to the feature selection, the creation of arti- 
ficial negative examples and our methodology of cost-benefit assessment. Three 
experiments with different aims are reported and discussed in Sect. 4. Section 5 
summarises the work and contains future enhancements. 



2 Related Work 

Current research dealing with information retrieval in the internet often tries 
to overcome weaknesses of general purpose search engines. For example, lots of 
domain specific search engines have been developed. [5] proposed the use of AI- 
techniques to improve the results of general search engines and discussed different 
approaches to get better results in specific domains. They discuss systems like 
Ahoy!, which uses heuristics to find homepages, or SlropBot, which tries to assist 
users doing online-shopping. [13] built a search engine for abstracts of papers of 
IEEE Transactions. The system responds to a query with results that are linked 
to other related documents and a set of suggestions for query refinements. The 
authors used an ontology that was previously derived from the collection with the 
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help of other resources like WordNet. [8] developed a system that automatically 
constructs hierarchies for small domains, for example intranets or local web sites. 
We were not able to adopt these approaches for our system as our aim was not 
to create a search engine for one single domain but a search engine that can be 
used in any domain. The creation of an ontology that covers every single topic 
in the whole internet is not feasible. 

Some systems use general purpose search engines and try to improve a user’s 
query to get better results. The system in [6] tries to select few features out of a 
set of documents which are then used to modify a search query. This approach 
is quite similar to the approach in [9] which is to generate keyword spices, which 
are keywords that can be added to any query to restrict the query to a certain 
domain. Both require a lot of preliminary human work to find positive and 
negative examples. Once done, their system is an easy implementable and fast 
domain specific search engine. Some of the systems are not absolutely fixed on 
one domain, but in most of them a lot of work has to be done to adapt them 
to a new domain and they are usually only functional if much more example 
documents are available then we have. 

A completely different approach which has some similarities with our system 
is discussed in [4]. The authors try to make use of the processing power of the 
user’s computer. After extracting all noun phrases from a set of documents, 
the user has to select some of those phrases. As a final step a Kolronen Self- 
Organizing Map (SOM) is used to cluster the web pages into a two dimensional 
map according to the selected noun phrases. Their feature selection process is 
similar to ours. Instead of a clustering we wanted to achieve a good ranking from 
the users point of view. 

Other papers deal with creating artificial or virtual examples. [10] incorpo- 
rates transformation invariances by creating artificial examples, for example. 
They have in common that they use some domain-specific knowledge to cre- 
ate new training samples by modifying classified training data. We do not have 
enough classified - and especially no negative - documents available. We had to 
find a new way to create negative examples. 

In contrary to some similar attempts to re-rank results of search engines we 
put our focus on the optimization of the ranking for the first few results: Users 
of search engines usually scan only the top results and skip the search if they 
cannot find any positive documents. This led us to the use of example documents 
and the creation of artificial negative examples. 

3 Design 

In our proposed system, the user specifies a general query that covers her do- 
main roughly and, in addition, she also provides a set of example pages within 
her domain of interest. As we understand that it is an expensive activity to turn 
up with many training cases, our system is able to cope with very few example 
pages to minimize the tedious workload for the user. The system is also designed 
to cope with only positive example documents and no negative ones. For exam- 
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pie, we used just three positive and no negative example documents in all our 
experiments. 

Figure 1 illustrates the different components of our system. Starting with the 
provided query, the system uses a general purpose search engine to retrieve a list 
of documents for the given query. It downloads a number of them and uses them 
together with the initial example documents in the feature extraction process to 
produce feature vectors for all documents. 

The system creates artificial (virtual) negative training samples to augment 
the existing pages to create the training set for the Machine Learner (ML). The 
confidence of the ML in each classification is used to re-rank the results. The 
results are further fine tuned to provide a better ranking than a random ordering. 

The computed ranking is presented to the user. The user can look at the 
corresponding web pages and provide feedback by specifying the correct clas- 
sification. This feedback can be used to further refine the ranking. While it is 
desirable to minimize the user involvement, we assume that it requires signifi- 
cantly less effort from the user to give some feedback to a ranked list of relatively 
filtered pages than to look at a vast number of irrelevant web pages. 



3.1 Feature Extraction 

To make the learning more efficient, only a subset of words in the documents is 
used. In our experiments we focused mainly on nouns because this can decrease 
the number of features significantly while still providing sufficient information 
for our task. We also applied simple stemming and the system is case-insensitive 
to further reduce the number of features. 
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We can constrain the number of features to a reasonably small size if we 
extract them from the training documents provided by the user. But if we select 
our features only from the few positive example documents, those features are not 
adequate to represent the negative documents. We have to select other features 
from the documents in the list offered by the search engine to overcome this 
problem. However, the total number of returned documents can be quite high 
so we need to restrict the total number of features to be under a threshold. In 
most of our experiments, this limit was set to around 300 words with about half 
of them selected from the example documents. 

Tf-idf is a way to represent the significance of a feature in the concerned 
corpus. It is determined by the term frequency (tf) as well as the inverse of the 
document frequency (df), i.e. the number of documents in which a term occurs. 
We have also taken into account that the length of a document has its role in 
the calculation of the weight value of each feature. The basic philosophy is: If a 
word occurs several times in a short document, it is more important than the 
same absolute frequency in a larger document. We finally calculated the values 
using the following formula: 



, ,tf*maxsize.. , .n. 

y = 1 + log T )) * log(-) 

docsize dj 



where n is the total number of documents, maxsize is the size of the largest 
document and docsize is the size of the document for which the current feature 
value is calculated. 



3.2 Fine Ranking 

When there are documents sharing the same confidence level, they will be pre- 
sented in a random order within this group. A fine ranking is applied to the 
documents within such a collection so that the ordering is better than a random 
ordering. The ordering scheme is based on Euclidian distances of the feature 
vectors in a subspace of the feature space. 

The idea was to restrict our feature space for distance calculations to features 
(attributes) that are most important for the set of positive documents. We select 
the attributes that occur in at least 

\/# positive documents 

positive documents (and at least 10 attributes). Then we compute the mean 
value of the shortened feature vectors of the positive documents in the current 
training set and rank the documents within one plateau according to the Eu- 
clidian distance of their feature vector from the mean vector. 

Even if this method of fine ranking may not be optimal, it performs better 
than average in all our test cases. If one of the more sophisticated MLs (e.g. 
Support Vector Machine) is used or the training set increases, the confidence 
value for each single document offered by the classifier usually differs enough to 
leave out the application of fine ranking. 
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3.3 Cost-Benefit Analysis 

The standard evaluation measurements for an information retrieval system are 
usually precision, recall and the F-value, a combination of them. In our case, they 
do not necessarily reflect the performance perceived by the user of our system: 
the ranking is more important than the actual classification. Another thorny 
issue in our design is to cater for the limited number of documents available for 
training. 

The main interest of a user is to have as many positive results as possible 
to be ranked to the topmost of the list. The application of a threshold value to 
one of the standard measurements as a ranking measurement is not meaningful 
because it does not capture the essence of the overall ranking. We created an 
alternative visualisation scheme that is similar to the ROC (Receiver Operating 
Clraracteristcs) curve 1 . This measure covers the whole ranking and corresponds 
to a user’s interest. We call the number of negative documents a user has to 
look at before she gets the next positive document the “cost”, and the number 
of positive documents the user gets before she has to look at the next negative 
document as “benefit” . We plot the (maximum) benefits (vertical axis) one can 
get against the costs (horizontal axis) it takes to reach them. 




Fig. 2. An example plot showing the properties of our evaluation method. The set of 
15 documents consists of 5 positive and 10 negative documents. 

A user starts looking at the documents, beginning with the first one. As soon 
as she encounters a negative document, she gives feedback to the system by 
classifying the new document as negative and all previous positive documents 
she looked at since the last negative document as positive. After reclassifying 
the remaining documents, the user continues to examine the results. In the op- 
timal case all positive documents come before all the negative documents in 

1 A ROC curve shows the relationship between false positives and true positives, or 
the recall versus fallout. 
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the ranking, which is maximum benefit at no costs. The documents retrieved 
by the query are independent of the set of sample documents. We regard the 
ranking of documents in Google to be just “random” . Google somehow sorts on 
the importance of the pages, but the sorting is for the whole web community, 
not specialized for a single user (for Google’s PageRank technology see [2]). In 
the evaluation of our re-ranking, we aim to get a ranking better than Google’s 
(or basically a random ranking). We consider the Google ranking as the lower 
bound of our performance criterion. 

The average curve in Fig. 2 represents the performance obtained by the 
underlying Google query. In this example setting we show the performance of 
two Active classifiers. Let’s look at the 1st classifier. The first three documents 
(in this example) are positive documents (+), i.e. no negative document (— ) has 
been seen yet. We have a point at (0,3) because the next document is negative, 
which means that we have to look at zero negative documents to get a maximum 
of three positive ones. After the first negative document (and therefore the first 
feedback) , there is exactly one positive document before the next negative one, so 
we have to look at one negative document to get a total maximum of four positive 
ones (point (1,4)). According to the arrival pattern of this set of documents for 
the 1st classifier, we should have the points (0,3), (1,4), (2,4) etc. plotted on the 
chart. The chart shows that our performance measurement is quite intuitive: The 
first classifier is closer to the optimum than the second one and therefore it has 
a better performance. For the first classifier you get three positive documents 
before you have to look at the first negative one. The ranking for the whole set 
of documents for the first classifier - including feedback - is 



3.4 Creating Artificial Negative Examples 

The most challenging requirement for our system is to cope with only a very 
small training set. This set is not only small, but it contains only positive sample 
documents and no negative ones at all. This makes the training difficult because 
a classification task generally anticipates two distinct classes of labelled data. 

Our system works with feedback on each negative document observed. For 
example, our system starts training with three documents and the first docu- 
ment the user looks at is a negative one. As soon as the user communicates her 
observation to the system, the training set becomes a set of three positive and 
one negative documents. Although the size has increased, this is still not big 
enough to get a useful training process for most of the MLs, and this has led us 
to the creation of artificial negative documents. 

After normalization the values for all features are £ [0,1]. The three positive 
sample documents are somewhere in a unit lrypercube. If we just create doc- 
uments within this lrypercube we could end up creating a negative document 
within a cluster of positive documents. 

We hold an assumption that there are two points in the lrypercube that 
will not be present in the set of positive documents; they are the points zero 
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(0 0 0 ... 0) T (Artificial Document 0, ADO) and one (111 ... 1) T (Artificial 
Document 1, ADI). A document which is very close to zero has nothing to 
do with the domain our user is interested in. The feature vector contains the 
attributes that are most common for the example set. We assume that other 
documents within our target domain will have at least some of these attributes in 
common. A document located at ADI will have the highest document frequency 
for all the attributes, but it is practically impossible to have such a document. 
This is because the feature vector contains high frequency features from the 
positive documents as well as from some of the negative documents. 

We create further nondeterministic negative documents. First, the subset 
of features that are insignificant for all currently known positive documents is 
computed. These are mainly attributes that do not occur in any of the known 
positive documents. They are probably characteristic for some cluster of negative 
documents. We use this set of features to compute new artificial instances: 

— For each new artificial negative document construct a feature vector that 
has a value of zero for all features. 

— For a certain percentage p of features out of the subset that was computed 
previously, set the value to a random number £ [0,1]. 

Empirical evaluations showed that it is reasonable to choose 0.1 < p < 0.4. For 
the experiments in Section 4, we chose p = 0.4. 

We are unsure about the usefulness of ADI in our classification task. For 
some combinations of scenarios and machine learners we got a better perfor- 
mance by using both ADO and ADI , for others the performance was better 
without ADI. We have not yet been able to find out whether it is better to omit 
ADI or not. However, the performance due to ADO is more obvious. If a doc- 
ument is completely off-topic and has nothing to do with any other document 
it is located at point zero or at least close by. It is likely that a bigger set of 
documents (e.g. 300) contains a few of such documents. Experiments showed 
that using the ADO can improve the performance. Therefore, we finally decided 
to use just ADO and random documents in all our experiments. 

In Fig. 3 we show some typical observations. As we expected, the performance 
without any artificial training examples is worst at the beginning. After the user 
has given feedback 15 times it outperforms the runs with artificial examples. This 
is because the system has gained enough “real” examples to train on and they are 
better than the artificial ones. But about 15 steps later the other runs take over 
again, the collected “real” documents do not represent all negative examples. 
The runs with more random examples perform slightly better in the beginning. 
Later on those with less random examples are better. Similar behaviour can be 
observed in other examples. Unfortunately this is not valid for all combinations 
of machine learners and scenarios. More indeptlr research is required to identify 
the relationship among ML, data profile and the artificial data. But in any case 
it is useful to create some artificial negative documents. It offers better results, 
especially for the first places in the ranking - and they are the most important 
ones in the eyes of the normal average user. 
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Fig. 3. Performance of different amounts of random examples, overlay scenario, aver- 
aged over 20 runs, 91 documents. 



4 Experiments and Evaluation 



Our architecture is generic and it is independent of any specific search engine. 
There are many general purpose search engines available in the internet, and we 
just chose Google to be the search engine to which the queries were sent in the 
current experiment. We have built a few scenarios of different kinds of domains 
to test if the system is able to cope with any domain as specified through the 
queries and the example documents provided by the user. We looked at either 
the first 100 or 500 results returned by Google. As we evaluated only html-pages 
(and not pdf, ps or ppt documents, for example), the actual size of our test set 
is usually a bit smaller. 

The results from the various queries are fed to a few MLs. We use tf-idf 
(normalized by document length), simple stemming, a stop list and only nouns 
and proper nouns (NN, NNS, NNP and NNPS). For all the runs we use six 
artificial negative documents ( ADO plus 5 randomly created documents within 
the hypercube). 

Our system has a choice of classifiers, but for clarity we focus on three basic 
machine-learning algorithms and three meta-learning schemes. The classifiers 
were obtained from WEKA [12] and they are: (1) J48, a decision tree learner 
which is an implementation of the popular C4.5, (2) IB1, which is a nearest 
neighbour machine learner and (3) SMO, an implementation of the Sequential 
Minimal Optimization algorithm - one of the fastest methods for Support Vector 
Machines (SVM). 
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We have also applied three different schemes to boost the performance, 
namely bagging and boosting for the J48 learner and co-training for the J48 
and SMO classifiers. Co- Training [1] is an algorithm that uses unlabeled data to 
augment a much smaller set of labelled data. In particular, it uses two distinct 
views on the same data to train two classifiers, uses the most confident predic- 
tions of both to augment the labelled data and repeats this process a number 
of iterations. In our current co-training implementation, we did a split on the 
attributes: One machine learner uses all the attributes at the even numbered 
positions (attributes number 0, 2, 4, 6, ...) while the other one uses the odd 
numbered attributes. 

We evaluated different settings on the number of artificial examples; espe- 
cially whether to use ADO and ADI. Since a test run which involves randomly 
generated instances will have non-determinstic performance, the charts we plot- 
ted are the average performance over several (a minimum of 20) runs. 

4.1 Scenario 1 

The aim of this experiment is to see if the system can cope with the situation 
that only very few documents are actually positive documents. 

Imagine a search made by a tourist looking for information about Australia. 
As already mentioned in the introduction, many users usually type in only one 
keyword. It will be “Australia” in this scenario. Naturally, most of the results 
that are returned from Google will not contain any useful information. Many 
returned pages are about the government, newspapers, commerce, education 
etc. One problem inherent in this classification is that the amount of positive 
documents is very small. The negative documents come from a diverse set of 
domains. This makes it difficult for our ML to identify the very small subdomain 
the user is interested in. Another problem is that many of these pages are the 
Welcome pages of some bigger websites. The documents have void contents, i.e. 
they are only some kind of menu. 

If you just search for “australia travel” you will neglect pages about ac- 
commodation and so on. Hence, the three initial documents for this example 
were created by three specialised queries. Each of these special queries contained 
two keywords; the first one was “australia” and the second keyword was either 
“travel”, “tourist” or “accommodation”. Out of the three Google results (first 
ten documents) we selected one document that is - in our opinion - of interest 
to a tourist. 

In this scenario, the number of useful pages among the first 100 results re- 
turned by Google is very small. It can be observed in Fig. 4 that most of the MLs 
can deliver the positive pages (the right kind of information) much earlier than 
the original ranking as offered by the underlying search engine. In other words, 
the user can find the information much quicker and she does not have to flip 
through many screens and to read many irrelevant documents. The SMO fares 
best among all the MLs, which is followed by the boosting and bagging of J48. 
The two co-training learning schemes did not pay off, which might be due to the 
reason that the total numbers of positive cases in this scenario is too small and 
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Fig. 4. Performance of different machine learners for the Australia query example 
(Scenario 1). Ranking of the first 87 Google results. 



is not sufficient to enable the co-trainer to take advantage of. The IB1 is worse 
than Google until the cost reaches five but eventually surpasses Google. As there 
are only four positive documents in this analysis, the differences in performance 
are very small - but all MLs outperform Google as a whole. 

4.2 Scenario 2 

In this scenario, the user has a better idea of what she is looking for. The aim 
of Scenario 2 is to find out how well the system can cope with a situation where 
all documents are somehow in the same domain - the vocabulary is similar for 
(almost) all documents. The target set is a very specialized subdomain. 

This scenario is, in fact, based on a real world problem we encountered. We 
wanted to connect a laptop over its TV-out-socket to a TV and watch a video. 
While we had no problem to see all “normal” graphics (windows, ...) on the TV, 
we only saw a black box where the video should have been. In our search, we 
got the first positive results with a very specific query containing four words: 
“laptop tv video overlay” . As there are several different solutions to this problem, 
it wasn’t necessarily sufficient to find only one solution. We used three pages of 
the Google results as our example documents and the task for our system was to 
identify all the other ones. The demands on the system are very different from 
the first example. The query is already very specialized (there are four terms 
in the search query), the domains of many of the documents are closely related 
to each other. Most of the good results are FAQ’s or questions/ answers found 
in forums. These pages often contain lots of advertisement, menus and other 
things that don’t have anything to do with the domain we are looking for - the 
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Fig. 5. Performance of different machine learners for the overlay query example (Sce- 
nario 2). Ranking of the first 91 Google results. 

proportion of text on the page that deals with the interesting topic is often very 
low. The returned pages in Scenario 1 could be classified relatively fast by a 
human user. In this scenario, the classification is difficult, even for a human. 

We used 91 pages from Google for our analysis. According to Fig. 5, all the 
MLs perform better than the original ranking made by Google. The IB1 learner 
fares well and it uses another distance measurement (distances to neighbours) 
than our fine ranking; its performance is still a good indication that distances can 
be used to generate a simple ranking. The behaviour of the J48 depends on our 
fine ranking scheme and it has delivered a good performance. All meta-learning 
mechanisms (bagging, boosting and co-training), when applied to J48, further 
improve the performance of the J48. The SMO outperforms all the other clas- 
sifiers, even co-training does not improve the results for the SMO. Co-training 
algorithms need plenty of both positive and negative instances within the unla- 
beled set, otherwise they are unable to assign a high confidence in the unlabeled 
documents to boost the labelled set. And in Scenario 2 we have only seven pos- 
itive documents out of 91 test instances, which is quite insufficient to facilitate 
co-training. 

4.3 Scenario 3 

A user does not necessarily have any sample pages from the domain of interest, 
but she may have some documents from a related domain. This last scenario 
aims to test the generalisation ability of our system. 

The user in this example is interested in a recipe for baking apple pie. She 
hasn’t got any apple pie recipes but she has other recipes and can give them 
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as input for our system to identify similar pages which contain the recipe she is 
interested in. One of the sample documents she provides is about cooking fish, 
one about baking bread and one about baking some other pie - the query she 
provides is “apple pie” . Here the sample pages are not directly from the domain 
of interest but rather from similar domains. Our system has to generalize from 
the given data to be able to identify new pages from the related domain. 




Fig. 6. Performance of different machine learners for the apple pie (recipe) query ex- 
ample (Scenario 3). Ranking of the first 462 Google results. 



In this scenario, we used the top 462 pages returned by Google for analysis 
and there are about 40% positive documents. This situation is better than in the 
previous two scenarios where we had only 5% to 8% of positive test instances. 
The scale of usefulness has exceeded the user’s expectation - who is interested 
in reading 204 different recipes of apple pies? In Fig. 6, all MLs perform much 
better than Google and the SMO shows superior results. The SMO-class MLs 
continue to excel in the performance league. The SMO ranks more than 80 
positive pages and its co-training version even has 100 positive documents on top 
of the list in the first step. The results delivered by the IB1 are better than those 
of the J48 and all its meta- learners. The J48 and its co-training counterparts 
produce the second and third weakest ranking. Bagging and boosting improve 
the performance of the J48 and boosting is, once more, better than bagging. 

This scenario has demonstrated the generalization capabilities of this system. 
The three sample documents are related to but do not belong to the target 
domain. The system has even achieved a good performance when it was only 
the fish recipe provided as a single positive example. Co-training is only useful if 
there are enough positive documents in the data set (negative ones are normally 
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abundant). But even without co-training: The SMO classifier shows a very good 
performance even on extreme data and small training sets. 

5 Conclusions and Future Work 

In Sect. 2 we have already looked at different systems which try to solve similar 
problems. All of them either concentrate only on one specific domain or they 
have to be adapted to work in a new area. We tried to design our system to 
eliminate these limitations and be as general as possible without concentrating 
on special domains, so no adaptation is required. Another assumption of these 
related systems is the availability of lots of preclassified training pages, but this 
is practically infeasible as we are not limited to one domain. We have demon- 
strated that our new approach is able to cope with a very small size of training 
examples. This includes the generation of artificial examples to cope with the 
lack of negative instances. 

From a user’s point of view, the time it takes to re-rank a few hundreds of 
documents can be neglected. The main bottleneck, as in similar systems, is the 
time it takes to download all the documents. 

Our system cannot always perform as good under the extreme conditions 
we examined as a search engine developed exclusively for a specific domain. 
It would be resource-expensive and time-consuming to create domain-specific 
search engines for any arbitrary domain. Hence, from the perspective of a user 
who is searching information in the web, our system performs quite well by 
always re-ranking relevant pages to the top of the list regardless whether it is a 
very general query (Scenario 1) or a very specialized one (Scenario 2), and the 
system is also able to generalise from one domain to a related one (Scenario 3). 

In spite of our good results there is still room for improvement. Currently 
we use only the content of a document but not the structure. The information 
contained in headings is usually more important than normal text. The feature 
selection can be modified to reflect the different importance of different parts of 
a document. Feature selection can also be performed after every user feedback 
to capture the very specific knowledge pertained in the new classification step 
provided by the user. In our experiments, we mainly chose five random artificial 
examples. It will be useful to find out a trade-off for the number of artificial 
instances that are used. Another possible extension will be related to co-training. 
For test sets with a limited amount of positive documents, this technique can not 
work properly and the performance suffers. If it is somehow possible to coarsely 
estimate the percentage of positive documents, we can adapt the number of runs 
adequately to improve the overall performance. 
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Abstract. Integration of new utterances into context is a central task in 
any model for rational (human-machine) dialogues in natural language. 
In this paper, a pragmatics-first approach to specifying the meaning of 
utterances in terms of plans is presented. These plans are computed dur- 
ing a dialogue on the basis of information about the current situation 
that is updated continually. New contributions are integrated into a dia- 
logue if they help in establishing a new plan, or if they deliver important 
information for executing an already established plan. 



1 Introduction 

Complex computer based systems are getting more and more important in ev- 
eryday life. Such systems should provide natural language interaction to users 
as this is the most convenient way to control a complex system. Many users are 
overwhelmed when they have to learn all the functions a specific device offers. 
However, it is easy to express individual wishes that the system should process. 

1.1 A Pragmatics-First View on Rational Dialogues 

Rational dialogues that are based on Grice’s maxims of conversation serve as 
a communicative tool in jointly executing a task in the domain of discourse 
(called the application domain ) by following a plan that could solve the task 
assigned to the participants of the dialogue. Therefore, the interpretation of 
new contributions and their integration into a dialogue is controlled by global 
factors (e.g. the assumption that all dialogue participants behave in a cooperative 
manner and work effectively towards the completion of a joint task) as well as by 
local factors (e.g. how does the new contribution serve in completing the current 
shared plan?). Within this framework, the paper focuses on two issues: 

— How can we determine whether a new contribution is related to executing 
the current step in a shared plan? 

— What is the influence of the computed relation on the dialogue structure and 
how does this affect the reaction of the hearer? 
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In this respect, we have to distinguish between the dialogue situation and the 
application situation: The dialogue situation is modified whenever speech acts 
are performed, whereas the application situation changes according to the effects 
of each action being executed. 

The main hypothesis of this paper is that modifications of the dialogue sit- 
uation are triggered by changes of the application situation. As a response to a 
speech act, dialogue participants perform a series of actions aiming at achieving 
some goal. If these actions can be executed, the reaction can signal success. At 
this point, our understanding of the role of shared plans exceeds that of [1]: 
Grosz and Kraus define an action to be resolved if it is assumed that an agent 
is able to execute the action. However, in order to understand coherence relations 
in complex dialogues, it is important to know whether an action has actually 
been executed and - if this is the case - what effect it has produced. Consider 
the following excerpt from a MapTask dialogue (quoted from [2]): 



MAP 9 

R: +- 1- and ++ you are not quite 
horizontal you are taking a slight 
curve up towards um the swamp ++ 
not obviously going into it 



G: well sorry I have not got a swamp 
R: you have not got a swamp? 

G: no 
R: OK 

G: start again from the palm beach 



Obviously, G has failed to find the swamp, which means G has failed to 
perform the action necessary to perform the next one (take a slight curve) in R’s 
previous utterance. Such cases, when the system acting as a dialogue participant 
notices that a necessary step has not been reached, are not covered sufficiently by 
current approaches to dialogue understanding, although they occur very often. In 
communication between humans, an analysis depending on the current dialogue 
and application situation serves as the basis for continuing a dialogue in such 
a way that misunderstandings can be clarified. An example from the Trains 
corpus [3] illustrates a typical human reaction: 

9.1 M: so we should 10.1 S: engine El 

9.2 : move the engine i j | yp pq 

9.3 : at Avon 12 .1 S: okay 

9.4 : engine E 13 .1 M: en g ine El 

9 - 5 : t0 13.2 : to Bath 



This excerpt is a subdialogue to clarify a misunderstanding in identifying 
an object (engine E vs. El). Finite state approaches cannot foresee all repair 
states for every combination of an assumed engine and the one actually at the 
mentioned location. Even worse, eventually the location is wrong and has to be 
repaired. How to decide that a priori? 

This paper elaborates an approach to plan based dialogue understanding 
that relies on an efficient planning algorithms: We use the Planning Domain 
Definition Language (PDDL) to model actions. The meaning of an utterance 
is specified in terms of a satishable plan whose execution is monitored. The 
convergence or divergence between the observed effects and the expected ones 
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determines the content and the function of the future speech acts. We use such 
information in order to diagnose and clarify misunderstandings as it is the case 
in human-human-interaction. 



1.2 Organisation of the Paper 

In section 2 we describe how knowledge is represented in our system. Section 3 
sketches how user intentions are recognized by analyzing natural language input. 
In section 4, we discuss the distinction between global and local factors. Section 
5 focuses on how follow-up contributions to the initial one may influence the 
update of a shared plan, and how this affects the relation between the new con- 
tribution and the previous dialogue. In section 6, we show that if the discourse 
relation of a new contribution is unclear, the dialogue can be continued by fo- 
cusing on a clarification. We claim in section 7 that decision procedures choose 
from several options that determine how the dialogue should be continued in 
such a situation. Finally, related work and conclusions are presented. 



1.3 An Example Dialogue 

The scenario type we are currently using is an online B2B shop in the internet 
(see [4]) where one can buy boxes of different types, sizes, colors, and materials. 
The shop offers various functions to the user: one can search the data base, 
select articles and move them to the basket, request an offer and negotiate the 
price, and finally accept or reject the offer. A typical dialogue between a user and 
system looks as in figure 1. Turn Systerri 2 results from the pragmatic information 
that the product specification given by the user was not completely matched. 



System i: Guten Tag und willkommen in 
unserem Shop! Was kann ich fur Sie tun? 
(Hello and Wellcome to our shop! How may 
I help you?) 

Users : Ich wtirde gerne 100 Boxen in Rot 
kaufen. (I want to buy 100 red boxes.) 
System 2 (highlights an item in a list 
of proposed products): Ich empfehle 111- 
nen diesen Artikel, denn er konimt Ihren 
Praferenzen am nachsten. (I recommend 
this products, as it matches your prefer- 
ences best.) 

User 2 '. Okay. 

Systems'. Hier sehen Sie ihren Warenkorb. 
Soil ich Ihnen dafiir ein Angebot erstcllen? 
(Here is your basket? Shall I present you 
an offer for it?) 



Users : Ich mochte meine alten Auftrage 
sehen. (I want to see my previous orders.) 
System 4: Bitte schon. (Here you are.) 

User 4 : Bitte mach mir ein Angebot! 
(Present me an offer, please!) 

Systems '■ Ich kann Ihnen die Waren fur 734 
Euro 90 anbieten. Dor Rabatt betragt 6%. 
(May I offer you the basket for 734 Euro 
90? The price is 6% off the retail price.) 
Users : Das ist mir zu teuer. 

Systems '■ Ich kann Ihnen noch um 20 Euro 
entgegenkommen. (I can give you 20 Euro 
additional discount.) 

Userr. Ich nehme das Angebot an. (I ac- 
cept your offer.) 

Systems.: Sie haben einen guten Kauf 
getatigt. Vielen Dank. (This is a good deal 
for you. Thank you.) 



Fig. 1. An example dialogue. 
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This influences the text generator’s decision on how to verbalize a response 
that does not affect the proposed offer negatively. In turn Systems, the system 
initiative is motivated by the fact that the execution of the next step in the plan 
that satisfies Users depends on the user’s willingness to continue the negiotation. 
In turn Users, the user does not respond directly to the question. However, the 
dialogue system realizes that in the current application situation there is no 
conflict between the goals of Users and Users ■ Therefore, the focus of the dialogue 
is shifted to the new goal. UserCs utterance returns to the previous goal and the 
focus is shifted back again after the temporary parallel goal has been achieved. 
Users is an indirect speech act by which the user starts negotiating the price. 
The system realizes that the goal expressed by the utterance can be reached 
if it is possible to modify the offer appropriately. This leads to turn Sy stems. 
User? allows the conclusion that the plan has been executed successfully. Systems 
signals that the plan for Users has been completed as well. 

2 A Domain Model for the Application 

Before explaining details of how dialogues like the one in figure 1 are processed, 
this section gives an overview of how linguistic and application specific knowledge 
is represented in the system. 

In order to be able to refer to functions and objects of the application in 
natural language, we need a domain model. It is generated semi-automatically 
from the specification of a Java API. This programming interface to the B2B 
online shop defines classes and methods that can be accessed, instantiated, and 
executed from outside the application. They are translated to concept defini- 
tions in the Description Logic language ACC [5] and incorporated in the SUMO 
(Suggested Upper Merged Ontology - see [6]) as shown in the following example 
for the class offer: 

class offer { 

T_NEW_CATALOG has-catalog; 

DISCOUNT has-discount ; 

PRICE has-price; 

OfferStatus has-status; 

PRICE offerO; 

void of f erconf irmation(DISCOUNT has-discount .PRICE has-price); }; 

The fields of the class are translated into role restrictions of the concept which 
stands for the class itself. The translation is recursive for the classes that serve 
as data types of the translated fields. The concept for offer is a subconcept of 
the SUMO concept Requesting in the following concept definition: 

(def ine-primitive-concept 
offer (and Requesting 

(all has-catalog T_NEW_CATALOG) 

(all has-discount DISCOUNT) 

(all has-price PRICE) 

(all has-status OfferStatus))) 
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Methods, in turn, are translated into planning operators in the PDDL planning 
language, as it is shown below: 

(: action of f erconf irmation 

:parameters (?t - T_NEW_CATALOG ?p - PRICE ?o - offer) 
precondition (and (has-catalog ?o ?t) (has-price ?o ?p) 

(not (has-status ?o confirmed))) 

: effect (has-status ?o confirmed)) 

This method offerconf irmation changes the status of an offer to confir- 
med if this has not been done already. A domain model of this kind allows for 
simulation of the effects of a natural language utterance as will be discussed 
below. 



3 Interpretation of Natural Language Utterances 

The interface between the shop and the user is implemented by configuring a 
natural language dialogue system to meet the needs of the class of dialogues that 
can be conducted with the online shop. Natural language input (either via speech 
or via keyboard) is parsed by the system described in [7]. The parser’s output - a 
discourse representation structure (DRS) - is interpreted by the dialogue system 
as a description of the goal the user wants to reach. Guided by the principle of 
cooperativity, the dialogue system tries to compute a shared plan for that goal. 
For that purpose, the planner described in [8] is applied to find out whether it 
is generally possible to find a plan for the user’s goal. For the user utterance 

User: I want to buy 100 red boxes. 

the plan in figure 4 is computed. The associated goal is shown in figure 2. It 
encodes the functional semantics of buy. If something is to be bought, an offer 
for it has to be accepted. This commonsense meaning of buy is the illocutionary 
force of the verb; the most important consequence of this observation is that any 
user utterance only indicates and describes transactions, but does not perform 
them. All information in an utterance is considered to be hypothetical; for it 
has to be verified that boxes can be bought at the shop, that boxes in red are 
available, and that the required quantity does not exceed the number of currently 
available boxes. 

Furthermore, the commonsense meaning has to be interpreted in terms of the 
application domain model. This is done firstly by deriving a formal description 
from the hypotheses indicated by the utterance (see the already mentioned goal 
in figure 2). Secondly, a plan for that goal has to be found. If this can be achieved, 
it is in principle possible to get the effects indicated by the user’s utterance, which 
implies that each step of the plan has to be carried out sucessfully. 

For planning, an initial situation is needed as a representation of the cur- 
rent state of affairs. In the interactive environment, there are three sources of 
information that add facts to the initial situation: 
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A : 



o s t p q f a 
OfFer(o) 

has-status(o, s) Confirmation^) 
has-catalog(o, t) T-New-Catalog(f) 
has-product(t,p) Product(p) 

has-quantity(p, q) Quantity(g) value(g, 100) Number(lOO) 
has-article(p, a) Article(a) has-feature(a, /) feature(/) 
has-FNAME(/, CAA074001) fname(CAA07400l) 
has-FVALUE(/, red) fvalue(red) 



Fig. 2. DRS for the goal derived from the utterance I want to buy 100 red boxes. 



(define (problem transaction342) 

(: objects ul - T_NEW_CATAL0G t - T_NEW_CATAL0G a - article 

ba342 - Basket o - Offer b342 - BMOResult n342 - UM-Need 
p - product q - quantity p342 - price s - Confirmation 
f - feature) 

( : init (has-content n ul) (has-product ul p) 

(has-quantity p q) (value q 100) (has-article p a) 

(has-feature a f) 

(has-FNAME f CAA074001) (has-FVALUE f red))) 

Fig. 3. Initial situation for the utterance I want to buy 100 red boxes. 

— The current state can be determined by the actions executed so far. 

— The user’s utterance contains hypothetical propositions about the state of 
the shop that have to be consistent with what is known from the record of 
the previous actions. 

— The plan for reaching the indicated goal contains hypothetical objects and 
assertions about them whose existence or correctness cannot be proven until 
the plan is (at least partially) executed. 

The last point implies that a violation of the closed world assumption normally 
made by PDDL planners can arise: first, since new objects can be introduced 
in the course of the negotiation, not all objects exists in the initial situation. 
Second, as a consequence, missing information cannot be considered wrong (by 
negation as failure) . Therefore, the initial situation for the planning will contain 
hypothetical objects and assertions to be validated later. The initial situation 
for the example above in shown in figure 3. 

Since the information about the user’s intentions is underspecified at the mo- 
ment when a plan is computed, the dialogue system interleaves the transactions 
which have to be performed by the online shop (steps 1, 3, 5) with interactive 
tasks (step 2, 4, 6). When they are performed by the dialogue system itself, 
system utterances that are expected to achieve the current interactive task are 
produced (e.g. in step 2: obtaining a selection of articles that will be added to the 
user’s shopping cart in step 3). During the plan execution, the dialogue system 
verifies whether the current situation allows the next step in the plan. For that 
purpose, it distinguishes between an interactive and an application-oriented (i.e. 




Bernd Ludwig 



1: PRODUCTSEARCH N U1 TB B SVEN 
2: REQUEST S SVEN TS1 
3: ADDTQBASKET S SVEN TB TS1 BA 
4: QUERY-IF R YES SVEN 
5: QFFERREQUEST N TS1 01 SVEN 
6: OFFERACCEPTANCE N TS1 01 

Fig. 4. Shared plan for the utterance I want to buy 100 red boxes. This plan is the 
situation dependent meaning of the utterance. 

(: action request 

: parameters (?st - SystemTurn ?n - UM-Selection 

?u - SIKOWO-DiscretePerson ?tn - T_NEW_CATAL0G) 

: precondition (and (about ?st ?n) 

(has-content ?n ?tn) 

(evoc-fun ?st doaction)) 

: effect (when (evoc-fun ?st doaction) 

(exists (?ut - UserTurn) 

(and (expr-fun ?ut doaction) 

(expressed ?ut ?u) 

(coherent ?ut ?n))))) 

Fig. 5. Definition of the interactive task REQUEST in PDDL. 

a transaction) task, because either the state of the transaction or that of the 
dialogue has to be consulted for the verification. 

4 Discourse and Application Pragmatics 

The plan in figure 4 highlights the fact that the dialogue model presented here 
draws a clear distinction between transactional and interactive tasks. While the 
first type of tasks aims to an explicit symbolic representation of transactions to 
be executed by the shop, the second type represents requests to the dialogue 
system for the interaction with the user during the execution of a shared plan. 
The transactional tasks are formalized in terms of PDDL plan operators; as 
a consequence, all applications whose functionality can be expressed in PDDL 
are suited for integration into the outlined dialogue model. Interactive tasks are 
represented in PDDL as well and therefore can be integrated into planning. The 
specification of REQUEST that is used in figure 4 is shown in figure 5. The effects 
of interactive tasks are taken into account by referring to the information state 
of the current interaction between user and system. For this purpose, a-modal 
constraints as (expressed ?ut ?u) and (coherent ?ut ?n) are used to verify 
that the necessary information is available for planning the next step towards 
the user goal as in the following excerpt of the preconditions for ADDTOBASKET: 

: precondition 

(and (coherent ?ut ?n) 

(has-catalog ?n ?ts) 

(expressed ?ut ?u) ) 
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4.1 Constructing Meaning 

The interactive tasks are executed by the dialogue system itself; they can be 
specified in detail in a proprietary programming language. It allows for mapping 
of tasks to speech acts that depends on the current dialogue situation (see section 
4.2). Interactive tasks determine the way in which the user contributes to the 
construction or execution of a shared plan. In general, there are three different 
types of contributions and therefore three classes of interactive tasks: 

— Modification of information 

— Querying of information 

— Execution of actions 

In cooperative dialogues, interactive tasks are completed when certain ex- 
pectations are met - according to the class a task belongs to: 

— Modification: Can the modification be performed without conflicts with the 
available knowledge? 

— Querying: The queried information has to be computed. 

— Execution: Can a shared plan be found and executed? 

Depending on the type of information that forms the content of the interactive 
task - e.g. the intensional or extensional knowledge a task refers to - the knowl- 
edge about plan operators, terminological definitions, linguistic (lexical, gram- 
matical) information, or the knowledge of the current dialogue and application 
situation is a subject to interactive tasks. In the class of dialogues discussed in 
this paper, interaction with the user can only be about the current situation. In 
such an approach to dialogue, natural language utterances fulfill two functions: 

— They indicate an interactive task (e.g. by cue phrases and syntactic means 
as questions or imperatives). 

— They indicate transactional tasks or information about the current applica- 
tion situation. 

On this basis utterances and dialogues can be analyzed by determining hypothe- 
ses for the situation based functions of each new utterance. In order to decide 
whether an utterance can be incorporated into a dialogue, the preconditions for 
a hypothesis are to be verified in the dialogue situation for the interactive task 
and in the application situation for the transactional task. In the example above, 
buy 100 red boxes indicates a transaction, while I want to does communicates 
new information about the speaker’s intention. This leads to the DRS in figure 
6: In this DRS, intention)*) captures the meaning of the modal verb want. 

4.2 Acting and Reacting in a Dialogue 

As a consequence of the described approach to semantics, plans have to be 
computed for discourse-pragmatic goals as well as for transactional goals. Pos- 
sible goals are specific for each class of interactive tasks and defined by the 
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i 

intention (i) 
content)*, A) 

Fig. 6. Formal meaning of the utterance I want 100 red boxes. A is defined in figure 2. 



expectations listed above. To meet the goal assigned to the example utterance, 
intention(i) needs to be verified in the current dialogue situation. 

For that purpose, a plan that determines the reaction to the new contribution 
has to be computed (see figure 7): for building the plan, operators are applied 
that encode the global factors mentioned in the introduction. The most impor- 
tant one is cooperativity. To verify whether the new intention is satisfiable, a 
shared plan is constructed for the user action buy: The user can buy something at 
the online shop, if in the current application situation a plan can be found for the 
presentation of an offer about the requested products (see figure 4). This means 
that the dialogue system tries to find a plan (in the application domain) that 
satisfies the transactional task indicated in the user utterance. The next step in 
reacting to the user utterance is to execute the plan. Finally, if the execution 
succeeded, the completion of the reaction is signaled by ACCEPT. ACCEPT is - as 
REQUEST and QUERY-IF in figure 4 are - a basic dialogue operation that maps 
interactive tasks to (sequences of) speech acts. Which speech acts are generated 
in order to verbalize the interactive task depends on the requirements of the 
current dialogue situation. In human communication, the decomposition of dis- 
course pragmatic goals into speech acts and their verbalization is influenced to a 
large extent by a number of factors like (see [9, 10]): e.g. topicalization, stylistic 
variability, relations between dialogue participants, availability and limitations 
of resources, or ognitive capacity and personality of the dialogue participants. 
To enable interactive tasks to consider these factors in generation of speech acts, 
all basic dialogue operations are programmable and in this way can take the 
current dialogue situation into account. 

The implementation of ACCEPT in figure 8 considers limitations of the cogni- 
tive capacity of the hearer: by summing up all discourse referents t in contribu- 
tion, it computes the relevance of the contribution’s content and its amount. 
By relating it to the temporal distance to the last utterance in the dialogue, a 
“speed of information” is calculated. If it is too high, a speech act is generated 
only if ACCEPT is important enough. 

The example shows that utterances are not directly related to interactive 
tasks. In order to reach a discourse pragmatic goal, factors not determined by 
the content of a task have to be taken into account. 

1: FIND-PLAN BUY P 
2: EXECUTE-PLAN P 
3: ACCEPT BUY 

Fig. 7. Discourse plan for the utterance I want to buy 100 red boxes. 




A Pragmatics-First Approach to the Analysis and Generation of Dialogues 



91 



proc ACCEPT (task contrib) ; 

drs utterance = content (contrib) ; float rel = 0.0, vol = 0.0, speed, dist; 
begin 

forall referent t in utterance do 

vol += consumption(utterance,t) , rel += priority(utterance,t) ; 

dist := timeO - last_time; last_time := curr_time; 
speed := vol/dist; old_speed := speed 
if speed < 0.25 then utter contrib; 
else if rel > 4.0 then utter contrib; 
end; 

Fig. 8. Implementation of the dialogue operation ACCEPT. 




Fig. 9. Chart of discourse relations after new turn has been shown irrelevant to the 
current focus. 



5 Building up the Discourse Structure 

The preceeding section explained that discourse structures are built during the 
execution of discourse pragmatic plans. The discourse structure is a graph whose 
edges relate interactive tasks (the nodes of the graph). An edge between two 
nodes determines in which way the later interactive task contributes to the com- 
pletion of the earlier one as in [11]. From the viewpoint of the discourse model 
presented in this paper, the relation of satisfaction-precedence holds between two 
tasks if they are subsequent steps in a plan (e.g. QUERY-IF (step 4) in figure 4 
is satisfaction-preceded by REQUEST (step 2): after the online shop finds articles 
in its data base and presents them to the user (step 1), it requests the user to 
select some of them for the shopping cart (step 2). The shop cannot ask the 
user if he wants an offer (step 4) until step 2 and 3 are completed. The relation 
of dominance holds between two tasks if the dominated task contributes to the 
completion of the dominating one. For example, all user responses are dominated 
by interactive tasks initiated by the system. Figure 9 shows the discourse struc- 
ture after step 2. Satisfaction-precedence is visualized by going up the y axis, 
while dominance holds between tasks if they appear along a horizontal line. 
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The goal of an interactive task is achieved if it dominates another interactive 
task. In this case, its expectation is met. For the case when the expectations 
of an interactive task are not met, consider the continuation of the dialogue in 
section 2: 

User: I want to buy 100 red boxes. 

System: You may now choose from the articles in the presented selec- 
tion. Which of them would you like? 

User: I need containers. 

In this example, even the reaction of the dialogue system (step 2) violates the 
expectation assigned to the interactive task (step 1) that has been initiated by 
the user. In order to react appropriately, the user has to analyze the system’s 
utterance and understand that it indicates an admissible step in a plan for the 
request. Analogously, the system has to analyze user utterances whether they 
constitute expected reactions that allow the next step in the shared plan to be 
executed. 

On the assumption that boxes and containers are two different types of ar- 
ticles and therefore with the utterance “I need containers. ” the user does not 
choose any article from the proposed selection, the goal for the second user 
utterance is incompatible with the shared plan for the first utterance. Neither 
satisfaction-precedence nor dominance holds between the system’s and the user’s 
second utterance. What else discourse relation holds between them? 

6 Logical and Discourse Relations 

For an answer to the above question, it is useful to analyze relations that even- 
tually hold between the transactional tasks assigned to the both utterances. The 
implementation of the online shop may allow other relations than satisfaction- 
precedence that implies the first task to be completed before the second one can 
start. If the shop is capable to present multiple windows for selections at the 
same time, it can handle two transactions concurrently. 

Concurrent execution of plans can be modeled with the help of the planner 
presented in section 2. The planner computes a partial order of the actions to 
be executed for a goal: if in a step there is more than one action specified, all 
actions in this step may be executed in parallel. Consequently, if one wants to 
test whether two interactive tasks are related by in-parallel , one has to compute 
a plan for the conjunction of the associated transactional tasks. If there is a plan, 
in-parallel eventually holds (see section 7). 

If no plan for execution in parallel can be found, the second utterance ( “I 
need containers. ”) blocks the execution of the current shared plan (see figure 4). 
In this case, as with in-parallel neither satisfaction-precedence nor dominance 
correctly express the relationship between interactive tasks. As [12] observes, 
additional discourse relations are necessary to adequately describe the discourse 
structure if a conflict occurs when an utterance is integrated into the discourse 
structure. 
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3 + 

2 - 
1 - 

1 -- 



Satisfaction precedence 



Which boxes? 
Expected: Modification 






Cancel Boxes? 
Expected: Modification 






4- 



Dialogue step 



1 2 3 4 5 7 8 9 10 11 12 13 

Fig. 10. Discourse structure resulting from a block relation between the last utterances. 



In contrast to [12], this approach distinguishes between discourse relations 
- which statically describe a state of affairs - and discourse pragmatic plans - 
which devise a way to continue a dialogue: if - during the analysis of the second 
user utterance ( “I need containers. ”) - it turns out that the new contribution 
blocks the completion of the current shared plan, some way out of this conflict 
must be found. In figure 10, the discourse structure is shown for the following 
continuation of the dialogue: 

System: You cannot browse boxes and containers at the same time. 

System: Do you want to stop selecting boxes? 

In our analysis the first utterance is dominated by the user request as it 
completes the associated interactive task (by indicating failure) . The second ut- 
terance satisfaction precedes everything else, as without an answer, the system 
cannot decide how to complete the pending transactions. In Asher’s terms a 
correction relation would hold between dialogue step 4 and 5, while in our view 
CORRECTION is the interactive task that was performed and resulted in the dis- 
course structure of figure 10. CORRECTION can be performed because a discourse 
pragmatic plan was found that allowed the dialogue system to ask the user for 
information in parallel to the pending query to choose items from the selection 
for the basket. 

In substance, Asher’s and our analysis do not differ; however, the distinction 
between relations and tasks is better suited for an efficient implementation of a 
dialogue model that allows for analysis and generation of dialogue turns. 

The user can react to the system’s attempt to resolve the blocking of the 
shared plan by 

User: Yes. 

In this case, the dialogue system can complete its plan and resolve the blocking 
by canceling the blocked shared plan. Now, a plan for satisfying the transactional 
task associated with dialogue step 4 can be found and executed. 
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V^in-parallell#, A) = V(expr (H) ^ evoc(A)\H, A) ■ 
V (parallel(A, H)\H, A) ■ 

V (anchor(A, H) \H, A) 

V(Blocking|if, A) = V (expr(H) 7^ evoc(A) \H, A) ■ 
V (blocked(A, H)\H, A) ■ 

V (anchor( A, H ) | H, A) 



Does task type match expectations? 
Can transactions run in parallel? 

Is A a good anchor for H ? 



Fig. 11. Valuation of discourse relations. 



7 Deciding How to React 

Depending on certain constraints in the application situation, when expecta- 
tions of interactive tasks are not met, the logical relation between the associated 
transactional tasks may lead to an ambiguous discourse relation between a new 
contribution and the previous dialogue. 

In this case, a decision has to be made which of the possible interpretations 
to choose. This decision is based on a valuation of each option in the current 
situation. Figure 11 shows how the options under consideration ( in-parallel and 
blocking ) are scored. Three factors are taken into account: First, the type of 
the interactive task expr (H) should not meet the expectation evoc(H) of the 
task A whose shared plan is executed currently. Second, the transactional tasks 
associated with A and H can be executable concurrently, and, third, A should 
be a good anchor for H (ideally, A is the current focus). On the basis of this 
valuation, the following decision rule is applied: 

relation}#, A) = v <-> v = argmax^.V (x\H, A) 

x € {in-parallel, Blocking} 

This rule selects the hypothesis with the best valuation. If there is no unique v, 
then a CLARIFICATION is started which generates a query to the user to choose 
an alternative: 

System: Do you want to browse boxes and containers as well? 

This example shows that blocking in a discourse pragmatic plan is handled as 
well as the blocking of a transactional task. The dialogue model allows user and 
system to start negotiating propositions about dialogue situation that is viewed 
as a cause for a discourse pragmatic plan to be blocked. 

8 Related Work and Conclusions 

Theoretical foundations for our work are the results in discourse analysis pre- 
sented in [13] on the influence of plans on discourse structure. [14, 15] show that 
the interactive nature of dialogues makes it necessary to coordinate plans of all 
participants involved in the dialogue. However, the conversational principles a 
participant is obliged to and the knowledge he has about the application do- 
main as well as about the current situation is not accessible to the discourse 
pragmatic reasoning of other dialogue participants. It can only be reconstructed 
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partially from utterances. In order to enable collaboration, the reconstructed 
information must be aligned with the information in knowledge base of the sys- 
tem. The alignment is achieved by continuously comparing expectations in plans 
with observations derived from utterances and the execution of actions. When 
a mismatch is found, plans for interactive tasks are constructed to obtain in- 
formation that is sufficient to decide whether a realignment is possible or not. 
From this abstract point of view, it does not make difference whether a task is 
related to application or discourse pragmatics. This advantage follows from the 
distinction between application and dialogue domain and the observation that 
the same planning and plan execution approach can be applied to both domains. 

The described approach is implemented in a dialogue system for spoken lan- 
guage that works in real-time. The system was presented to the public during 
SYSTEMS in October 2003. The implementation shows the tractability of the 
outlined approach; the prototype system performs in real-time. For the future, 
recovery from failed plans has to be studied in more detail. As corpus analyses 
show, repair strategies are influenced by application specific knowledge. We will 
explore how such strategies can be specified for various dialogue applications. 

[16] presents a detailed analysis of the dialogue engine used in the TRINDI 
and TALK projects. Larsson’s notion of accommodation is similar to what 
was called alignment above. However, recovery from critical situations is not 
discussed. Dialogue and application domain remain undistinguished. 

[17] focuses on the issue of system architecture; a detailed theoretical model 
for dialogues is not in the scope of Allen’s paper. 

A logic-based approach to dialogue understanding is presented in [18]. Our 
approach differs from Bos’ and Oka’s one by separating between dialog and 
application situation instead of relying on a discourse memory only. Beyond 
that, by incorporating a planner as an reasoning mechanism additional to model 
building, we can handle different reasoning problems with the help of appropri- 
ate tools. Finally, we discuss the selection among and execution of interactive 
configurable strategies when expectations of dialogue participants are violated. 
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Abstract. Natural Language Generation (NLG) systems have almost 
reached the state of “market-readiness” now, mostly because hybrid sys- 
tems of different types have emerged as a de-facto standard. But still 
relatively few dialog systems make use of NLG techniques. 

In this paper, we discuss the output part of our spoken language dialog 
system by presenting an example scenario including dialog management, 
NLG, and speech synthesis. Our approach to hybrid NLG couples shallow 
and deep processing with respect to the linguistic and pragmatic system 
resources and also on the architectural level and thus increases process- 
ing efficiency (compared to pure deep generation) as well as generative 
power (compared to pure shallow generation). Our system has been ap- 
plied to three different domains, namely home A/V management, model 
train controlling, and B2B e-procurement. 

Keywords: Hybrid Natural Language Generation, Spoken Language Di- 
alog System, Bottom-up Generation 



1 Introduction 

There are different ways of realizing the system output part of a human-computer 
interface, including mail-merge, human authoring and graphics (cf. [1]). Among 
these, we think NLG, and, more specifically, hybrid NLG is the most promising 
approach for dialog systems, because it can provide a unique combination of 
processing efficiency and linguistic coverage. 

In this paper, we discuss our variant of hybrid NLG which is embedded in our 
spoken language dialog system. The paper is organized as follows: In sect. 2, we 
will introduce our notion of hybrid NLG which is somewhat different from current 
approaches. Sect. 3 deals with the system core of Hyperbug, our hybrid NLG 
system which couples and interleaves shallow and deep processing with respect 
to system resources and architecture to increase processing efficiency as well as 
generative power. The part of dialog management which is relevant for system 
output specification is described in sect. 4. In sect. 5, we discuss a relatively 
simple, but nevertheless illustrative example to demonstrate how system output 
is accomplished in our system; we will also describe the system components 
involved in more detail there. Finally, we address relevant work in the field of 
hybrid approaches for NLG and future work for our NLG system in sect. 6. 



S. Biundo, T. Friihwirth, and G. Palm (Eds.): KI 2004, LNAI 3238, pp. 97-111, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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2 Hybrid NLG Systems 

Several notions of hybridism in the context of NLG systems exist, including 
stochastic approaches [2], usage of machine learning (ML), as in [3] and lately 
in [4], and generation using XSLT[5]. We will use a rather traditional notion 
of hybridism instead: Our definition concentrates on a mixing strategy for the 
two classical approaches, shallow and deep generation. [6] describes two types 
of such hybrid NLG systems: Type I consists of shallow generation with deep 
elements, type II of deep generation with shallow elements. We want a third 
type to be added to this typology: This type III uses separate shallow and deep 
processing branches and combines the results appropriately, in analogy to the 
approach taken in Verbmobil [7] for the language analysis part of the system. 
This analogy is certainly not complete: In Verbmobil, several different analysis 
branches process the input simultaneously. After that , a module called Integrated 
Processing combines the results to obtain a single semantic representation, the 
Verbmobil Interface Term (VIT), for further processing. In our system, as we 
will describe further in sect. 3, the decision between the different generation 
branches is made before actual processing is done. 

However, if you model analysis and generation as inverse operations 1 , you can 
motivate the different positions of the decision module in the Verbmobil analy- 
sis part and our generation system by saying that we just mirror the Verbmobil 
architecture: What comes last in analysis (combining several syntactic and se- 
mantic representations into one single structure) must come first in generation 
(expanding one single semantic structure into several output specifications for 
the different processing branches). But there is, of course, a different, more ob- 
vious, and more practical argument for putting the decision module in front of 
the generation process: We want to save processing time by feeding the same 
input structure into different processing branches which are essentially capable 
of producing the same results. 

On a theoretical level, shallow and deep processing are not far apart: Ba- 
sically a template system allowing recursive templates has the same expressive 
power as a context-free grammar. On the other hand, if you restrict a grammar 
to non-recursive rules, you can compile grammar and lexicon into a single tem- 
plate system. But in practice, both these extreme approaches are normally not 
taken: There are situations when fast and simple system answers are needed, 
and there are also situations which require more elaborated answers. In a stan- 
dard hybrid approach, the discourse and application domains determine which 
mixture between shallow or deep generation techniques is most appropriate. But 
it would be more desirable to allow this mixing strategy to be tailored to the 
specific dialog situation at hand. 

Hence, our approach to NLG is to build a system that involves all three types 
of hybridism and decides at runtime, given a specific dialog situation, which one 
to choose for the current utterance. After the utterance has been generated, 

1 Which is at least true on a sufficiently abstract level, as the input of language analysis 
is the output of language generation and vice versa. 
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a feedback loop tries to determine whether the chosen processing branch has 
produced the desired result and updates the decision strategy accordingly. We 
will provide a more detailed description of our NLG system in the following 
section. 

3 NLG System Core 

In our generation system Hyperbug, the three types of hybrid NLG approaches 
introduced in sect. 2 are all encompassed: The system utilizes canned text, tem- 
plates, and deep NLG not only in concurring processing branches, but combines 
shallow processing with deep elements, deep processing with shallow elements, 
and concurring shallow and deep processing in one single system. The system 
core is displayed in fig. 1. 
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Fig. 1 . System Core. 



When an utterance has to be generated, the dialog manager sends a discourse 
representation structure (DRS) [8], a semantic representation of the utterance 
content, to the Decision Module which determines whether shallow or deep 
generation is more appropriate to produce the desired output. The decision mod- 
ule maintains an index table of available templates and canned text sentences. 
Deep generation is used only if canned text and shallow generation are unavail- 
able, i. e. if no pointer to an appropriate template in the index table is found. 
This way, the more complex pattern matching algorithm used by the shallow 
generation branch is not needed in the decision module: We want to avoid using 
any computational resources needed only for deep or shallow generation here. 
The decision is rather based on values for certain XPath variables, resulting in a 
shallow but still sufficient analysis. The XPaths are potentially domain-specific 
and may have to be replaced when a domain change is envisaged, but only if 
the interface specifications are modified 2 . Concurring shallow and deep process- 

2 Such modifications have already proved necessary in the transition between two of 
our domains, cf. sect. 6. 
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ing branches combined with the decision module result in our system becoming 
hybrid type III in our classification. 

If shallow generation is selected, a Template System with an advanced 
pattern matching algorithm which uses a database with multipart templates 3 
further processes the input. The template database entries are modular (by 
inclusion of other entries), they contain subparts and repeatable sections (for 
enumerations), syntactic features like category and agreements (for lexicon and 
morphology access) 4 , and prosodic markers and pointers to wave files (for speech 
synthesis). By this enriched template system our NLG approach becomes type 
I in our classification in sect. 1. 

If the decision module opts for deep generation, a Sentence Planner con- 
verts the input DRS into an extended logical form (ELF). In addition to to 
a conventional LF the ELF may contain syntactic features like number, tense, 
mode, and subordination clause type. Surface forms like proper nouns are also 
allowed; hence the deep generation module is of hybrid type II. The ELF is pro- 
cessed by the deep syntactic realization module: As the acronym Hyperbug 5 
suggests, we have implemented an extended version of the Bottom-up Generation 
algorithm presented in [9] as our deep generation component. This component 
uses a unification grammar and re-uses our system lexicon and morphology com- 
ponent which were initially developed for parsing. During the generation process 
the surface structure is produced and, at the same time, syntactic information is 
gathered by the unification algorithm and stored in an extended derivation tree. 
Its leaves contain the surface structures of the words which make up the gener- 
ated sentence, their syntactic categories, and optionally subtype and agreement 
features. 

After deep generation is complete, a module called Bridge generates a new 
template out of the derivation tree and feeds it back to the shallow genera- 
tion branch for further use in subsequent dialogs. The decision module is also 
requested to update its index table of available templates; thus, the deep process- 
ing branch is not needed for generating utterances of the same type any longer. 
This way, the workload on deep generation is consistently reduced at runtime. 

With any of the two processing branches, Hyperbug produces a natural 
language sentence specification which is sent to the speech synthesis agent which 
converts it into a wave file and utters it to the user. 



4 Dialog Management and System Output Specification 

Our NLG system is not implemented as a stand-alone software module, but inte- 
grated into a generic multi-agent spoken language dialog system which contains 

3 Sect. 5 will give an example. 

4 Note that lexicon and morphology component are linguistic resources not normally 
used in templates, but rather reserved for deep generation. 

5 It stands for Hybrid Pragmatically Embedded Realization using Bottom-Up 
Generation. 
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Fig. 2. Distributed System Architecture. 



agents for speech analysis, parsing and semantic construction, dialog manage- 
ment, speech output, and for the (domain-specific) technical system which en- 
capsulates application information and control. This technical system can range 
from a simple database querying interface in a slot-filling domain to a complex 
planner and problem solver in a more sophisticated domain such as electronic 
device management. 

Fig. 2 sort of zooms out of fig. 1 and shows the distributed architecture of 
the dialog system in which the generation module is integrated. In this system, 
content for utterances can be determined in two places: On the one hand, prag- 
matic feedback to user requests is generated within the Technical system: it is 
responsible for content generation in cases when system initiative is required as 
well. On the other hand, the Dialog Manager (DM) itself can generate content 
for utterances if the current dialog situation requires it to do so (e. g. for the 
clarification of syntactic ambiguities in a previous user utterance) . 

The DM is responsible for the integration of user and system utterances 
into the discourse context. From the NLG perspective, it also provides content 
determination (either on its own or with the help of the technical system, as 
stated above): It provides the generation module with a semantic specification 
for an utterance to be generated which, however, lacks any form of linguistic 
knowledge: The DM is amodal , i. e. the language output part of the system is 
just another application for the DM, similar to the technical system. In our dialog 
system, the output specification is written in a DRS and encoded in XML 6 . 



Technically, the XML structure is the content part of a message in the agent com- 
munication language KQML, but this is not in our focus here. 
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Fig. 3. Translation from the application to natural language via the semantic level. 



The two-stage translation process from the application domain to natural 
language in our dialog system is shown in fig. 3: The application domain concepts 
on the level of the technical system are shown in the bottom of the figure; they 
are mapped to linguistic concepts on the semantic level (depicted in the middle 
of the figure) by the dialog manager (DM), and these in turn are verbalized by 
the NLG module, i. e. translated to surface strings (shown on top of fig. 3). The 
second stage of this translation process will be covered by our example in the 
next section. 

The DM also makes an important decision in terms of discourse pragmatics, 
namely which speech act is most appropriate for the integration of the utterance 
to be generated into the current dialog context 7 . For planning and realizing the 
utterance in detail, this information is passed to the generation module as well, 
because it has to decide how the speech act and the content of the utterance can 
be verbalized best. 



5 Example Output Scenario 

In this section, we will give an overview of the output part of our dialog system 
by using an example in a home A/V management domain. This example itself is 
simple but nevertheless points to some interesting linguistic and implementation 
problems. In our example scenario, the user has just requested a VCR recording 
of the program “The Day After” tonight (on July 3rd, 2004) at 08:15 PM, and the 
system wants to inform the user that the request has been processed successfully 
(i. e. that the VCR has been programmed accordingly). 

7 This decision procedure is described in [10] and cannot be further discussed here. 




